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My  primary  goal  in  writing  Understanding  Analysis  was  to  create  an  elemen¬ 
tary  one-semester  book  that  exposes  students  to  the  rich  rewards  inherent  in 
taking  a  mathematically  rigorous  approach  to  the  study  of  functions  of  a  real 
variable.  The  aim  of  a  course  in  real  analysis  should  be  to  challenge  and  im¬ 
prove  mathematical  intuition  rather  than  to  verify  it.  There  is  a  tendency, 
however,  to  center  an  introductory  course  too  closely  around  the  familiar  the¬ 
orems  of  the  standard  calculus  sequence.  Producing  a  rigorous  argument  that 
polynomials  are  continuous  is  good  evidence  for  a  well-chosen  definition  of  con¬ 
tinuity,  but  it  is  not  the  reason  the  subject  was  created  and  certainly  not  the 
reason  it  should  be  required  study.  By  shifting  the  focus  to  topics  where  an 
untrained  intuition  is  severely  disadvantaged  (e.g.,  rearrangements  of  infinite 
series,  nowhere-differentiable  continuous  functions,  Cantor  sets),  my  intent  is  to 
bring  an  intellectual  liveliness  to  this  course  by  offering  the  beginning  student 
access  to  some  truly  significant  achievements  of  the  subject. 


The  Main  Objectives 

Real  analysis  stands  as  a  beacon  of  stability  in  the  otherwise  unpredictable  evo¬ 
lution  of  the  mathematics  curriculum.  Amid  the  various  pedagogical  revolutions 
in  calculus,  computing,  statistics,  and  data  analysis,  nearly  every  undergradu¬ 
ate  program  continues  to  require  at  least  one  semester  of  real  analysis.  My 
own  department  once  challenged  this  norm  by  creating  a  mathematical  sciences 
track  that  allowed  students  to  replace  our  two  core  proof-writing  classes  with 
electives  in  departments  like  physics  and  computer  science.  Within  a  few  years, 
however,  we  concluded  that  the  pieces  did  not  hold  together  without  a  course  in 
analysis.  Analysis  is,  at  once,  a  course  in  philosophy  and  applied  mathematics. 
It  is  abstract  and  axiomatic  in  nature,  but  is  engaged  with  the  mathematics 
used  by  economists  and  engineers. 

How  then  do  we  teach  a  successful  course  to  students  with  such  diverse 
interests  and  expectations?  Our  desire  to  make  analysis  required  study  for  wider 
audiences  must  be  reconciled  with  the  fact  that  many  students  find  the  subject 
quite  challenging  and  even  a  bit  intimidating.  One  unfortunate  resolution  of  this 
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dilemma  is  to  make  the  course  easier  by  making  it  less  interesting.  The  omitted 
material  is  inevitably  what  gives  analysis  its  true  flavor.  A  better  solution  is  to 
find  a  way  to  make  the  more  advanced  topics  accessible  and  worth  the  effort. 

I  see  three  essential  goals  that  a  semester  of  real  analysis  should  try  to  meet: 

1.  Students  need  to  be  confronted  with  questions  that  expose  the  insufficiency 
of  an  informal  understanding  of  the  objects  of  calculus.  The  need  for  a 
more  rigorous  study  should  be  carefully  motivated. 

2.  Having  seen  mainly  intuitive  or  heuristic  arguments,  students  need  to  learn 
what  constitutes  a  rigorous  mathematical  proof  and  how  to  write  one. 

3.  Most  importantly,  there  needs  to  be  significant  reward  for  the  difficult 
work  of  firming  up  the  logical  structure  of  limits.  Specifically,  real  anal¬ 
ysis  should  not  be  just  an  elaborate  reworking  of  standard  introductory 
calculus.  Students  should  be  exposed  to  the  tantalizing  complexities  of 
the  real  line,  to  the  subtleties  of  different  flavors  of  convergence,  and  to 
the  intellectual  delights  hidden  in  the  paradoxes  of  the  infinite. 

The  philosophy  of  Understanding  Analysis  is  to  focus  attention  on  questions 
that  give  analysis  its  inherent  fascination.  Does  the  Cantor  set  contain  any 
irrational  numbers?  Can  the  set  of  points  where  a  function  is  discontinuous 
be  arbitrary?  Are  derivatives  continuous?  Are  derivatives  integrable?  Is  an 
infinitely  differentiable  function  necessarily  the  limit  of  its  Taylor  series?  In 
giving  these  topics  center  stage,  the  hard  work  of  a  rigorous  study  is  justified 
by  the  fact  that  they  are  inaccessible  without  it. 


The  Audience 

This  book  is  an  introductory  text.  The  only  prerequisite  is  a  robust  understand¬ 
ing  of  the  results  from  single- variable  calculus.  The  theorems  of  linear  algebra 
are  not  needed,  but  the  exposure  to  abstract  arguments  and  proof  writing  that 
usually  comes  with  this  course  would  be  a  valuable  asset.  Complex  numbers  are 
never  used. 

The  proofs  in  Understanding  Analysis  are  written  with  the  beginning  student 
firmly  in  mind.  Brevity  and  other  stylistic  concerns  are  postponed  in  favor 
of  including  a  significant  level  of  detail.  Most  proofs  come  with  a  generous 
amount  of  discussion  about  the  context  of  the  argument.  What  should  the 
proof  entail?  Which  definitions  are  relevant?  What  is  the  overall  strategy? 
Whenever  there  is  a  choice,  efficiency  is  traded  for  an  opportunity  to  reinforce 
some  previously  learned  technique.  Especially  familiar  or  predictable  arguments 
are  often  deferred  to  the  exercises. 

The  search  for  recurring  ideas  exists  at  the  proof-writing  level  and  also  on 
the  larger  expository  level.  I  have  tried  to  give  the  course  a  narrative  tone  by 
picking  up  on  the  unifying  themes  of  approximation  and  the  transition  from  the 
finite  to  the  infinite.  Often  when  we  ask  a  question  in  analysis  the  answer  is 
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“sometimes.”  Can  the  order  of  a  double  summation  be  exchanged?  Is  term-by- 
term  differentiation  of  an  infinite  series  allowed?  By  focusing  on  this  recurring 
pattern,  each  successive  topic  builds  on  the  intuition  of  the  previous  one.  The 
questions  seem  more  natural,  and  a  coherent  story  emerges  from  what  might 
otherwise  appear  as  a  long  list  of  theorems  and  proofs. 

This  book  always  emphasizes  core  ideas  over  generality,  and  it  makes  no 
effort  to  be  a  complete,  deductive  catalog  of  results.  It  is  designed  to  capture  the 
intellectual  imagination.  Those  who  become  interested  are  then  exceptionally 
well  prepared  for  a  second  course  starting  from  complex-valued  functions  on 
more  general  spaces,  while  those  content  with  a  single  semester  come  away  with 
a  strong  sense  of  the  essence  and  purpose  of  real  analysis. 


The  Structure  of  the  Book 

Although  the  book  finds  its  way  to  some  sophisticated  results,  the  main  body 
of  each  chapter  consists  of  a  lean  and  focused  treatment  of  the  core  topics 
that  make  up  the  center  of  most  courses  in  analysis.  Fundamental  results  about 
completeness,  compactness,  sequential  and  functional  limits,  continuity,  uniform 
convergence,  differentiation,  and  integration  are  all  incorporated. 

What  is  specific  here  is  where  the  emphasis  is  placed.  In  the  chapter  on  inte¬ 
gration,  for  instance,  the  exposition  revolves  around  deciphering  the  relationship 
between  continuity  and  the  Riemann  integral.  Enough  properties  of  the  integral 
are  obtained  to  justify  a  proof  of  the  Fundamental  Theorem  of  Calculus,  but 
the  theme  of  the  chapter  is  the  pursuit  of  a  characterization  of  integrable  func¬ 
tions  in  terms  of  continuity.  Whether  or  not  Lebesgue’s  measure-zero  criterion 
is  treated,  framing  the  material  in  this  way  is  still  valuable  because  it  is  the 
questions  that  are  important.  Mathematics  is  not  a  static  discipline.  Students 
should  be  aware  of  the  historical  reasons  for  the  creation  of  the  mathematics 
they  are  learning  and  by  extension  realize  that  there  is  no  last  word  on  the 
subject.  In  the  case  of  integration,  this  point  is  made  explicitly  by  including 
some  relatively  modern  developments  on  the  generalized  Riemann  integral  in 
the  additional  topics  of  the  last  chapter. 

The  structure  of  the  chapters  has  the  following  distinctive  features. 

Discussion  Sections:  Each  chapter  begins  with  the  discussion  of  some  mo¬ 
tivating  examples  and  open  questions.  The  tone  in  these  discussions  is  inten¬ 
tionally  informal,  and  full  use  is  made  of  familiar  functions  and  results  from 
calculus.  The  idea  is  to  freely  explore  the  terrain,  providing  context  for  the 
upcoming  definitions  and  theorems.  After  these  exploratory  introductions,  the 
tone  of  the  writing  changes,  and  the  treatment  becomes  rigorously  tight  but 
still  not  overly  formal.  With  the  questions  in  place,  the  need  for  the  ensuing 
development  of  the  material  is  well  motivated  and  the  payoff  is  in  sight. 

Project  Sections:  The  penultimate  section  of  each  chapter  (the  final  section  is 
a  short  epilogue)  is  written  with  the  exercises  incorporated  into  the  exposition. 
Proofs  are  outlined  but  not  completed,  and  additional  exercises  are  included  to 
elucidate  the  material  being  discussed.  The  sections  are  written  as  self-guided 
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tutorials,  but  they  can  also  be  the  subject  of  lectures.  I  typically  use  them  in 
place  of  a  final  examination,  and  they  work  especially  well  as  collaborative  as¬ 
signments  that  can  culminate  in  a  class  presentation.  The  body  of  each  chapter 
contains  the  necessary  tools,  so  there  is  some  satisfaction  in  letting  the  students 
use  their  newly  acquired  skills  to  ferret  out  for  themselves  answers  to  questions 
that  have  been  driving  the  exposition. 


Building  a  Course 

Although  this  book  was  originally  designed  for  a  12-14-week  semester,  it  has 
been  used  successfully  in  any  number  of  formats  including  independent  study. 
The  dependence  of  the  sections  follows  the  natural  ordering,  but  there  is  some 
flexibility  as  to  what  can  be  treated  and  omitted. 

•  The  introductory  discussions  to  each  chapter  can  be  the  subject  of  lecture, 
assigned  as  reading,  omitted,  or  substituted  with  something  preferable. 
There  are  no  theorems  proved  here  that  show  up  later  in  the  text.  I  do 
develop  some  important  examples  in  these  introductions  (the  Cantor  set, 
Dirichlet’s  nowhere-continuous  function)  that  probably  need  to  find  their 
way  into  discussions  at  some  point. 

•  Chapter  3,  Basic  Topology  of  R,  is  much  longer  than  it  needs  to  be.  All 
that  is  required  by  the  ensuing  chapters  are  fundamental  results  about 
open  and  closed  sets  and  a  thorough  understanding  of  sequential  com¬ 
pactness.  The  characterization  of  compactness  using  open  covers  as  well 
as  the  section  on  perfect  and  connected  sets  are  included  for  their  own  in¬ 
trinsic  interest.  They  are  not,  however,  crucial  to  any  future  proofs.  The 
one  exception  to  this  is  a  presentation  of  the  Intermediate  Value  Theorem 
(IVT)  as  a  special  case  of  the  preservation  of  connected  sets  by  continu¬ 
ous  functions.  To  keep  connectedness  truly  optional,  I  have  included  two 
direct  proofs  of  IVT  based  on  completeness  results  from  Chapter  1. 

•  All  the  project  sections  (1.6,  2.8,  3.5,  4.6,  5.4,  6.7,  7.6,  8. 1-8.6)  are  optional 
in  the  sense  that  no  results  in  later  chapters  depend  on  material  in  these 
sections.  The  six  topics  covered  in  Chapter  8  are  also  written  in  this 
tutorial-style  format,  where  the  exercises  make  up  a  significant  part  of  the 
development.  The  only  one  of  these  sections  that  might  benefit  from  a 
lecture  is  the  unit  on  Fourier  series,  which  is  a  bit  longer  than  the  others. 


Changes  in  the  Second  Edition 

In  light  of  the  encouraging  feedback — especially  from  students — I  decided  not 
to  attempt  any  major  alterations  to  the  central  narrative  of  the  text  as  it  was 
set  out  in  the  original  edition.  Some  longer  sections  have  been  edited  down, 
or  in  one  case  split  in  two,  and  the  unit  on  Taylor  series  is  now  part  of  the 
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core  material  of  Chapter  6  instead  of  being  relegated  to  the  closing  project 
section.  In  contrast  to  the  main  body  of  the  book,  significant  effort  has  gone 
into  revising  the  exercises  and  projects.  There  are  roughly  150  new  exercises  in 
this  edition  alongside  200  or  so  of  what  I  feel  are  the  most  effective  problems 
from  the  first  edition.  Some  of  these  introduce  new  ideas  not  covered  in  the 
chapters  (e.g.,  Euler’s  constant,  infinite  products,  inverse  functions),  but  the 
majority  are  designed  to  kindle  debates  about  the  major  ideas  under  discussion 
in  what  I  hope  are  engaging  ways.  There  are  ample  propositions  to  prove  but 
also  a  good  supply  of  Moore-method  type  exercises  that  require  assessing  the 
validity  of  various  conjectures,  deciphering  invented  definitions,  or  searching  for 
examples  that  may  not  exist. 

The  introductory  discussion  to  Chapter  6  is  new  and  tells  the  story  of  how 
Euler’s  deft  and  audacious  manipulations  of  power  series  led  to  a  computation 
of  JW  /n2.  Providing  a  proper  proof  for  Euler’s  sum  is  the  topic  of  one  of 
three  new  project  sections.  The  other  two  are  a  treatment  of  the  Weierstrass 
Approximation  Theorem  and  an  exploration  of  how  to  best  extend  the  domain  of 
the  factorial  function  to  all  of  R.  Each  of  these  three  topics  represents  a  seminal 
achievement  in  the  history  of  analysis,  but  my  decision  to  include  them  has  as 
much  to  do  with  the  associated  ideas  that  accompany  the  main  proofs.  For  the 
Weierstrass  Approximation  Theorem,  the  particular  argument  that  I  chose  relies 
on  Taylor  series  and  a  deep  understanding  of  uniform  convergence,  making  it 
an  ideal  project  to  conclude  Chapter  6.  The  journey  to  a  proper  definition  of  x\ 
allowed  me  to  include  a  short  unit  on  improper  integrals  and  a  proof  of  Leibniz’s 
rule  for  differentiating  under  the  integral  sign.  The  accompanying  topics  for  the 
project  on  Euler’s  sum  are  an  analysis  of  the  integral  remainder  formula  for 
Taylor  series  and  a  proof  of  Wallis’s  famous  product  formula  for  i r.  Yes  these 
are  challenging  arguments  but  they  are  also  beautiful  ideas.  Returning  to  the 
thesis  of  this  text,  it  is  my  conviction  that  encounters  with  results  like  these 
make  the  task  of  learning  analysis  less  daunting  and  more  meaningful.  They 
make  the  epsilons  matter. 
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The  Real  Numbers 


1.1  Discussion:  The  Irrationality  of  V2 

Toward  the  end  of  his  distinguished  career,  the  renowned  British  mathematician 
G.H.  Hardy  eloquently  laid  out  a  justification  for  a  life  of  studying  mathematics 
in  A  Mathematician’s  Apology ,  an  essay  first  published  in  1940.  At  the  center 
of  Hardy’s  defense  is  the  thesis  that  mathematics  is  an  aesthetic  discipline.  For 
Hardy,  the  applied  mathematics  of  engineers  and  economists  held  little  charm. 
“Real  mathematics,”  as  he  referred  to  it,  “must  be  justified  as  art  if  it  can  be 
justified  at  all.” 

To  help  make  his  point,  Hardy  includes  two  theorems  from  classical  Greek 
mathematics,  which,  in  his  opinion,  possess  an  elusive  kind  of  beauty  that, 
although  difficult  to  define,  is  easy  to  recognize.  The  first  of  these  results  is 
Euclid’s  proof  that  there  are  an  infinite  number  of  prime  numbers.  The  second 
result  is  the  discovery,  attributed  to  the  school  of  Pythagoras  from  around  500 
B.C.,  that  is  irrational.  It  is  this  second  theorem  that  demands  our  attention. 
(A  course  in  number  theory  would  focus  on  the  first.)  The  argument  uses  only 
arithmetic,  but  its  depth  and  importance  cannot  be  overstated.  As  Hardy  says, 
“[It]  is  a  ‘simple’  theorem,  simple  both  in  idea  and  execution,  but  there  is  no 
doubt  at  all  about  [it  being]  of  the  highest  class.  [It]  is  as  fresh  and  significant  as 
when  it  was  discovered — two  thousand  years  have  not  written  a  wrinkle  on  [it].” 

Theorem  1.1.1.  There  is  no  rational  number  whose  square  is  2. 

Proof.  A  rational  number  is  any  number  that  can  be  expressed  in  the  form  p/q , 
where  p  and  q  are  integers.  Thus,  what  the  theorem  asserts  is  that  no  matter 
how  p  and  q  are  chosen,  it  is  never  the  case  that  (p/q)  =  2.  The  line  of  attack 
is  indirect,  using  a  type  of  argument  referred  to  as  a  proof  by  contradiction. 
The  idea  is  to  assume  that  there  is  a  rational  number  whose  square  is  2  and 
then  proceed  along  logical  lines  until  we  reach  a  conclusion  that  is  unacceptable. 
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At  this  point,  we  will  be  forced  to  retrace  our  steps  and  reject  the  erroneous 
assumption  that  some  rational  number  squared  is  equal  to  2.  In  short,  we  will 
prove  that  the  theorem  is  true  by  demonstrating  that  it  cannot  be  false. 

And  so  assume,  for  contradiction,  that  there  exist  integers  p  and  q  satisfying 


We  may  also  assume  that  p  and  q  have  no  common  factor,  because,  if  they  had 
one,  we  could  simply  cancel  it  out  and  rewrite  the  fraction  in  lowest  terms.  Now, 
equation  (1)  implies 

(2)  p 2  =  2  q2. 

From  this,  we  can  see  that  the  integer  p2  is  an  even  number  (it  is  divisible  by  2), 
and  hence  p  must  be  even  as  well  because  the  square  of  an  odd  number  is  odd. 
This  allows  us  to  write  p  =  2r,  where  r  is  also  an  integer.  If  we  substitute  2 r 
for  p  in  equation  (2),  then  a  little  algebra  yields  the  relationship 

2  r2  =  q2. 

But  now  the  absurdity  is  at  hand.  This  last  equation  implies  that  q2  is  even, 
and  hence  q  must  also  be  even.  Thus,  we  have  shown  that  p  and  q  are  both 
even  (i.e.,  divisible  by  2)  when  they  were  originally  assumed  to  have  no  common 
factor.  From  this  logical  impasse,  we  can  only  conclude  that  equation  (1)  cannot 
hold  for  any  integers  p  and  q,  and  thus  the  theorem  is  proved.  □ 

A  component  of  Hardy’s  definition  of  beauty  in  a  mathematical  theorem 
is  that  the  result  have  lasting  and  serious  implications  for  a  network  of  other 
mathematical  ideas.  In  this  case,  the  ideas  under  assault  were  the  Greeks’  under¬ 
standing  of  the  relationship  between  geometric  length  and  arithmetic  number. 
Prior  to  the  preceding  discovery,  it  was  an  assumed  and  commonly  used  fact 
that,  given  two  line  segments  AB  and  CD,  it  would  always  be  possible  to  find 
a  third  line  segment  whose  length  divides  evenly  into  the  first  two.  In  modern 
terminology,  this  is  equivalent  to  asserting  that  the  length  of  CD  is  a  rational 
multiple  of  the  length  of  AB.  Looking  at  the  diagonal  of  a  unit  square  (Fig.  1.1), 
it  now  followed  (using  the  Pythagorean  Theorem)  that  this  was  not  always  the 
case.  Because  the  Pythagoreans  implicitly  interpreted  number  to  mean  rational 
number,  they  were  forced  to  accept  that  number  was  a  strictly  weaker  notion 
than  length. 

Rather  than  abandoning  arithmetic  in  favor  of  geometry  (as  the  Greeks  seem 
to  have  done),  our  resolution  to  this  limitation  is  to  strengthen  the  concept  of 
number  by  moving  from  the  rational  numbers  to  a  larger  number  system.  From 
a  modern  point  of  view,  this  should  seem  like  a  familiar  and  somewhat  natural 
phenomenon.  We  begin  with  the  natural  numbers 


N  =  {1,2, 3, 4, 5,...}. 
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Figure  1.1:  a/2  exists  AS  A  geometric  length. 

The  influential  German  mathematician  Leopold  Kronecker  (1823-1891)  once 
asserted  that  “The  natural  numbers  are  the  work  of  God.  All  of  the  rest  is 
the  work  of  mankind.”  Debating  the  validity  of  this  claim  is  an  interesting 
conversation  for  another  time.  For  the  moment,  it  at  least  provides  us  with 
a  place  to  start.  If  we  restrict  our  attention  to  the  natural  numbers  N,  then 
we  can  perform  addition  perfectly  well,  but  we  must  extend  our  system  to  the 
integers 

Z  =  {...,-3, -2, -1,0, 1,2, 3,...} 

if  we  want  to  have  an  additive  identity  (zero)  and  the  additive  inverses  necessary 
to  define  subtraction.  The  next  issue  is  multiplication  and  division.  The  number 
1  acts  as  the  multiplicative  identity,  but  in  order  to  define  division  we  need  to 
have  multiplicative  inverses.  Thus,  we  extend  our  system  again  to  the  rational 
numbers 

Q  =  <  all  fractions  -  where  p  and  q  are  integers  with  q  /  0 

l  Q 

Taken  together,  the  properties  of  Q  discussed  in  the  previous  paragraph 
essentially  make  up  the  definition  of  what  is  called  a  field.  More  formally  stated, 
a  field  is  any  set  where  addition  and  multiplication  are  well-defined  operations 
that  are  commutative,  associative,  and  obey  the  familiar  distributive  property 
a(b-\-  c)  =  ab  +  ac.  There  must  be  an  additive  identity,  and  every  element  must 
have  an  additive  inverse.  Finally,  there  must  be  a  multiplicative  identity,  and 
multiplicative  inverses  must  exist  for  all  nonzero  elements  of  the  field.  Neither 
Z  nor  N  is  a  field.  The  finite  set  {0,1,  2,  3, 4}  is  a  field  when  addition  and 
multiplication  are  computed  modulo  5.  This  is  not  immediately  obvious  but 
makes  an  interesting  exercise. 

The  set  Q  also  has  a  natural  order  defined  on  it.  Given  any  two  rational 
numbers  r  and  8,  exactly  one  of  the  following  is  true: 

r  <  s,  r  =  s, 


or  r  >  s. 
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V2 


1.414 

Figure  1.2:  Approximating  y/2  with  rational  numbers. 


This  ordering  is  transitive  in  the  sense  that  if  r  <  s  and  s  <  t,  then  r  <  £,  so 
we  are  conveniently  led  to  a  mental  picture  of  the  rational  numbers  as  being 
laid  out  from  left  to  right  along  a  number  line.  Unlike  Z,  there  are  no  intervals 
of  empty  space.  Given  any  two  rational  numbers  r  <  s,  the  rational  number 
(r  +  s)/2  sits  halfway  in  between,  implying  that  the  rational  numbers  are  densely 
nestled  together. 

With  the  field  properties  of  Q  allowing  us  to  safely  carry  out  the  algebraic 
operations  of  addition,  subtraction,  multiplication,  and  division,  let’s  remind 
ourselves  just  what  it  is  that  Q  is  lacking.  By  Theorem  1.1.1,  it  is  apparent 
that  we  cannot  always  take  square  roots.  The  problem,  however,  is  actually 
more  fundamental  than  this.  Using  only  rational  numbers,  it  is  possible  to 
approximate  y/2  quite  well  (Fig.  1.2).  For  instance,  1.4142  =  1.999396.  By 
adding  more  decimal  places  to  our  approximation,  we  can  get  even  closer  to 
a  value  for  V%  but,  even  so,  we  are  now  well  aware  that  there  is  a  “hole”  in 
the  rational  number  line  where  y/2  ought  to  be.  Of  course,  there  are  quite  a 
few  other  holes — at  y/3  and  y/E,  for  example.  Returning  to  the  dilemma  of  the 
ancient  Greek  mathematicians,  if  we  want  every  length  along  the  number  line  to 
correspond  to  an  actual  number,  then  another  extension  to  our  number  system 
is  in  order.  Thus,  to  the  chain  N  C  Z  C  Q  we  append  the  real  numbers  R. 

The  question  of  how  to  actually  construct  R  from  Q  is  rather  complicated 
business.  It  is  discussed  in  Section  1.3,  and  then  again  in  more  detail  in  Sec¬ 
tion  8.6.  For  the  moment,  it  is  not  too  inaccurate  to  say  that  R  is  obtained  by 
filling  in  the  gaps  in  Q.  Wherever  there  is  a  hole,  a  new  irrational  number  is 
defined  and  placed  into  the  ordering  that  already  exists  on  Q.  The  real  numbers 
are  then  the  union  of  these  irrational  numbers  together  with  the  more  familiar 
rational  ones.  What  properties  does  the  set  of  irrational  numbers  have?  How 
do  the  sets  of  rational  and  irrational  numbers  fit  together?  Is  there  a  kind  of 
symmetry  between  the  rationals  and  the  irrationals,  or  is  there  some  sense  in 
which  we  can  argue  that  one  type  of  real  number  is  more  common  than  the 
other?  The  one  method  we  have  seen  so  far  for  generating  examples  of  irra¬ 
tional  numbers  is  through  square  roots.  Not  too  surprisingly,  other  roots  such 
as  ■\/2  or  y/E  are  most  often  irrational.  Can  all  irrational  numbers  be  expressed 
as  algebraic  combinations  of  nth  roots  and  rational  numbers,  or  are  there  still 
other  irrational  numbers  beyond  those  of  this  form? 
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1.2  Some  Preliminaries 

The  vocabulary  necessary  for  the  ensuing  development  comes  from  set  theory 
and  the  theory  of  functions.  This  should  be  familiar  territory,  but  a  brief  review 
of  the  terminology  is  probably  a  good  idea,  if  only  to  establish  some  agreed-upon 
notation. 

Sets 

Intuitively  speaking,  a  set  is  any  collection  of  objects.  These  objects  are  referred 
to  as  the  elements  of  the  set.  For  our  purposes,  the  sets  in  question  will  most 
often  be  sets  of  real  numbers,  although  we  will  also  encounter  sets  of  functions 
and,  on  a  few  occasions,  sets  whose  elements  are  other  sets. 

Given  a  set  A,  we  write  x  G  A  if  x  (whatever  it  may  be)  is  an  element  of  A. 
If  x  is  not  an  element  of  A ,  then  we  write  x  jt  A.  Given  two  sets  A  and  B ,  the 
union  is  written  A  U  B  and  is  defined  by  asserting  that 

x  G  A  U  B  provided  that  x  G  A  or  x  G  B  (or  potentially  both). 

The  intersection  An  B  is  the  set  defined  by  the  rule 

x  G  A  n  B  provided  x  G  A  and  x  G  B. 

Example  1.2.1.  (i)  There  are  many  acceptable  ways  to  assert  the  contents 

of  a  set.  In  the  previous  section,  the  set  of  natural  numbers  was  defined 
by  listing  the  elements:  N  =  {1,  2,  3, . . .}. 

(ii)  Sets  can  also  be  described  in  words.  For  instance,  we  can  define  the  set  E 
to  be  the  collection  of  even  natural  numbers. 

(iii)  Sometimes  it  is  more  efficient  to  provide  a  kind  of  rule  or  algorithm  for 
determining  the  elements  of  a  set.  As  an  example,  let 

S  =  {r  G  Q  :  r2  <  2}. 

Read  aloud,  the  definition  of  S  says,  “Let  S  be  the  set  of  all  rational 
numbers  whose  squares  are  less  than  2.”  It  follows  that  1  G  S',  4/3  G  S', 
but  3/2  ^  S  because  9/4  >  2. 

Using  the  previously  defined  sets  to  illustrate  the  operations  of  intersection 
and  union,  we  observe  that 

N  U  £  =  N,  N  H  E  =  E,  NnS  =  {1},  and  E  D  S  =  0. 

The  set  0  is  called  the  empty  set  and  is  understood  to  be  the  set  that  con¬ 
tains  no  elements.  An  equivalent  statement  would  be  to  say  that  E  and  S  are 
disjoint. 
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A  word  about  the  equality  of  two  sets  is  in  order  (since  we  have  just  used  the 
notion).  The  inclusion  relationship  A  C  B  or  B  D  A  is  used  to  indicate  that 
every  element  of  A  is  also  an  element  of  B.  In  this  case,  we  say  A  is  a  subset  of 
L>,  or  B  contains  A.  To  assert  that  A  =  B  means  that  A  C  B  and  B  C  A.  Put 
another  way,  A  and  B  have  exactly  the  same  elements. 

Quite  frequently  in  the  upcoming  chapters,  we  will  want  to  apply  the  union 
and  intersection  operations  to  infinite  collections  of  sets. 

Example  1.2.2.  Let 


A  i  =  N  =  {1,2,3,...}, 
A-2  =  {2, 3,4,...}, 

A3  =  {3,4,5,...}, 

and,  in  general,  for  each  n  E  N,  define  the  set 

Aji  ==  {n,  n  T  1,  n  T  2, . . .}. 
The  result  is  a  nested  chain  of  sets 


A\  T  A 2  T  A3  T  A4  T  •  •  •  5 

where  each  successive  set  is  a  subset  of  all  the  previous  ones.  Notationally, 

00 

(J  An,  (J  An,  or  ^U^UAsU--- 

n= 1  nCN 

are  all  equivalent  ways  to  indicate  the  set  whose  elements  consist  of  any  element 
that  appears  in  at  least  one  particular  An.  Because  of  the  nested  property  of 
this  particular  collection  of  sets,  it  is  not  too  hard  to  see  that 

00 

|^J  An  =  A\ . 

n— 1 

The  notion  of  intersection  has  the  same  kind  of  natural  extension  to  infinite 
collections  of  sets.  For  this  example,  we  have 

00 

n  An=%. 

n— 1 

Let’s  be  sure  we  understand  why  this  is  the  case.  Suppose  we  had  some  natural 
number  m  that  we  thought  might  actually  satisfy  m  E  Pl^Li  An.  What  this 
would  mean  is  that  m  E  An  for  every  An  in  our  collection  of  sets.  Because  m 
is  not  an  element  of  Am+ 1,  no  such  m  exists  and  the  intersection  is  empty. 

As  mentioned,  most  of  the  sets  we  encounter  will  be  sets  of  real  numbers. 
Given  4CR,  the  complement  of  A,  written  Ac,  refers  to  the  set  of  all  elements 
of  R  not  in  A.  Thus,  for  ACR, 


Ac  =  {x  E  R  :  x  ^  A}. 
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A  few  times  in  our  work  to  come,  we  will  refer  to  De  Morgan’s  Laws,  which 
state  that 

(AnB)c  =  ACUBC  and  (A  U  B)c  =  Ac  n  Bc. 

Proofs  of  these  statements  are  discussed  in  Exercise  1.2.5. 

Admittedly,  there  is  something  imprecise  about  the  definition  of  set  pre¬ 
sented  at  the  beginning  of  this  discussion.  The  defining  sentence  begins  with 
the  phrase  “Intuitively  speaking,”  which  might  seem  an  odd  way  to  embark  on  a 
course  of  study  that  purportedly  intends  to  supply  a  rigorous  foundation  for  the 
theory  of  functions  of  a  real  variable.  In  some  sense,  however,  this  is  unavoid¬ 
able.  Each  repair  of  one  level  of  the  foundation  reveals  something  below  it  in 
need  of  attention.  The  theory  of  sets  has  been  subjected  to  intense  scrutiny  over 
the  past  century  precisely  because  so  much  of  modern  mathematics  rests  on  this 
foundation.  But  such  a  study  is  really  only  advisable  once  it  is  understood  why 
our  naive  impression  about  the  behavior  of  sets  is  insufficient.  For  the  direction 
in  which  we  are  heading,  this  will  not  happen,  although  an  indication  of  some 
potential  pitfalls  is  given  in  Section  1.7 . 

Functions 

Definition  1.2.3.  Given  two  sets  A  and  L>,  a  function  from  A  to  B  is  a  rule  or 
mapping  that  takes  each  element  x  E  A  and  associates  with  it  a  single  element 
of  B.  In  this  case,  we  write  /  :  A  B.  Given  an  element  x  E  A,  the  expression 
f(x)  is  used  to  represent  the  element  of  B  associated  with  x  by  f.  The  set  A  is 
called  the  domain  of  /.  The  range  of  /  is  not  necessarily  equal  to  B  but  refers 
to  the  subset  of  B  given  by  {y  E  B  :  y  =  f(x)  for  some  x  E  A). 

This  definition  of  function  is  more  or  less  the  one  proposed  by  Peter  Lejeune 
Dirichlet  (1805-1859)  in  the  1830s.  Dirichlet  was  a  German  mathematician  who 
was  one  of  the  leaders  in  the  development  of  the  rigorous  approach  to  functions 
that  we  are  about  to  undertake.  His  main  motivation  was  to  unravel  the  issues 
surrounding  the  convergence  of  Fourier  series.  Dirichlet’s  contributions  figure 
prominently  in  Section  8.5,  where  an  introduction  to  Fourier  series  is  presented, 
but  we  will  also  encounter  his  name  in  several  earlier  chapters  along  the  way. 
What  is  important  at  the  moment  is  that  we  see  how  Dirichlet’s  definition 
of  function  liberates  the  term  from  its  interpretation  as  a  type  of  “formula.” 
In  the  years  leading  up  to  Dirichlet’s  time,  the  term  “function”  was  generally 
understood  to  refer  to  algebraic  entities  such  as  f(x)  =  x2-\-l  or  g(x)  =  y/  xA  +  4. 
Definition  1.2.3  allows  for  a  much  broader  range  of  possibilities. 

Example  1.2.4.  In  1829,  Dirichlet  proposed  the  unruly  function 

(x)  =  {  1  if^Q 

9[>  \0  ifx^Q. 

The  domain  of  g  is  all  of  R,  and  the  range  is  the  set  {0, 1}.  There  is  no  single 
formula  for  g  in  the  usual  sense,  and  it  is  quite  difficult  to  graph  this  function 
(see  Section  4.1  for  a  rough  attempt),  but  it  certainly  qualifies  as  a  function 
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according  to  the  criterion  in  Definition  1.2.3.  As  we  study  the  theoretical  nature 
of  continuous,  differentiable,  or  integrable  functions,  examples  such  as  this  one 
will  provide  us  with  an  invaluable  testing  ground  for  the  many  conjectures  we 
encounter. 


Example  1.2.5  (Triangle  Inequality).  The  absolute  value  function  is  so 
important  that  it  merits  the  special  notation  \x\  in  place  of  the  usual  f(x)  or 
g(x).  It  is  defined  for  every  real  number  via  the  piecewise  definition 


(  x  if  x  >  0 
|  —x  if  x  <  0. 


With  respect  to  multiplication  and  division,  the  absolute  value  function  satisfies 


(i)  | ab 


a 


b  and 


(ii)  \a  +  b\  <  \a\  +  | b 


for  all  choices  of  a  and  b.  Verifying  these  properties  (Exercise  1.2.6)  is  just  a 
matter  of  examining  the  different  cases  that  arise  when  a,  6,  and  a+b  are  positive 
and  negative.  Property  (ii)  is  called  the  triangle  inequality.  This  innocuous 
looking  inequality  turns  out  to  be  fantastically  important  and  will  be  frequently 
employed  in  the  following  way.  Given  three  real  numbers  a,  6,  and  c,  we  certainly 
have 


a  —  b 


(a  —  c)  +  (c  —  b) 


By  the  triangle  inequality, 


c)  T  (c  —  6)|  < 


a  —  c 


+ 


c 


b 


so  we  get 


(i) 


a  —  b  < 


a  —  c 


+  c  —  b 


Now,  the  expression  \a  —  b\  is  equal  to  \b  —  a\  and  is  best  understood  as  the  dis¬ 
tance  between  the  points  a  and  b  on  the  number  line.  With  this  interpretation, 
equation  (1)  makes  the  plausible  statement  that  the  distance  from  a  to  b  is  less 
than  or  equal  to  the  distance  from  a  to  c  plus  the  distance  from  c  to  b.  Pre¬ 
tending  for  a  moment  that  these  are  points  in  the  plane  (instead  of  on  the  real 
line),  it  should  be  evident  why  this  is  referred  to  as  the  “triangle  inequality.” 


Logic  and  Proofs 

Writing  rigorous  mathematical  proofs  is  a  skill  best  learned  by  doing,  and  there 
is  plenty  of  on-the-job  training  just  ahead.  As  Hardy  indicates,  there  is  an  artis¬ 
tic  quality  to  mathematics  of  this  type,  which  may  or  may  not  come  easily,  but 
that  is  not  to  say  that  anything  especially  mysterious  is  happening.  A  proof  is 
an  essay  of  sorts.  It  is  a  set  of  carefully  crafted  directions,  which,  when  followed, 
should  leave  the  reader  absolutely  convinced  of  the  truth  of  the  proposition  in 
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question.  To  achieve  this,  the  steps  in  a  proof  must  follow  logically  from  pre¬ 
vious  steps  or  be  justified  by  some  other  agreed-upon  set  of  facts.  In  addition 
to  being  valid,  these  steps  must  also  fit  coherently  together  to  form  a  cogent 
argument.  Mathematics  has  a  specialized  vocabulary,  to  be  sure,  but  that  does 
not  exempt  a  good  proof  from  being  written  in  grammatically  correct  English. 

The  one  proof  we  have  seen  at  this  point  (to  Theorem  1.1.1)  uses  an  indirect 
strategy  called  proof  by  contradiction.  This  powerful  technique  will  be  employed 
a  number  of  times  in  our  upcoming  work.  Nevertheless,  most  proofs  are  direct. 
(It  also  bears  mentioning  that  using  an  indirect  proof  when  a  direct  proof  is 
available  is  generally  considered  bad  form.)  A  direct  proof  begins  from  some 
valid  statement,  most  often  taken  from  the  theorem’s  hypothesis,  and  then  pro¬ 
ceeds  through  rigorously  logical  deductions  to  a  demonstration  of  the  theorem’s 
conclusion.  As  we  saw  in  Theorem  1.1.1,  an  indirect  proof  always  begins  by 
negating  what  it  is  we  would  like  to  prove.  This  is  not  always  as  easy  to  do  as  it 
may  sound.  The  argument  then  proceeds  until  (hopefully)  a  logical  contradic¬ 
tion  with  some  other  accepted  fact  is  uncovered.  Many  times,  this  accepted  fact 
is  part  of  the  hypothesis  of  the  theorem.  When  the  contradiction  is  with  the 
theorem’s  hypothesis,  we  technically  have  what  is  called  a  contrapositive  proof. 

The  next  proposition  illustrates  a  number  of  the  issues  just  discussed  and 
introduces  a  few  more. 


Theorem  1.2.6.  Two  real  numbers  a  and  b  are  equal  if  and  only  if  for  every 
real  number  e  >  0  it  follows  that  \a  —  b\  <  e. 

Proof.  There  are  two  key  phrases  in  the  statement  of  this  proposition  that 
warrant  special  attention.  One  is  “for  every,”  which  will  be  addressed  in  a 
moment.  The  other  is  “if  and  only  if.”  To  say  “if  and  only  if”  in  mathematics 
is  an  economical  way  of  stating  that  the  proposition  is  true  in  two  directions. 
In  the  forward  direction,  we  must  prove  the  statement: 

(=>)  If  a  =  b,  then  for  every  real  number  e  >  0  it  follows  that  \a  —  b\  <  e. 
We  must  also  prove  the  converse  statement: 

(<^=)  If  for  every  real  number  e  >  0  it  follows  that  \a  —  b\  <  e,  then  we  must 
have  a  =  b. 

For  the  proof  of  the  first  statement,  there  is  really  not  much  to  say.  If  a  =  6, 
then  | a  —  b\  =  0,  and  so  certainly  \a  —  b\  <  e  no  matter  what  e  >  0  is  chosen. 

For  the  second  statement,  we  give  a  proof  by  contradiction.  The  conclusion 
of  the  proposition  in  this  direction  states  that  a  =  6,  so  we  assume  that  a  ^  b. 
Heading  off  in  search  of  a  contradiction  brings  us  to  a  consideration  of  the  phrase 
“for  every  e  >  0.”  Some  equivalent  ways  to  state  the  hypothesis  would  be  to 
say  that  “for  all  possible  choices  of  e  >  0”  or  “no  matter  how  e  >  0  is  selected, 
it  is  always  the  case  that  |  a  —  b 
the  moment),  the  choice  of 


<  e.”  But  assuming  a  ^  b  (as  we  are  doing  at 


>  0 


eo  = 


a  —  b 
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poses  a  serious  problem.  We  are  assuming  that  \a  —  b\  <  e  is  true  for  every 
e  >  0,  so  this  must  certainly  be  true  of  the  particular  eo  just  defined.  However, 
the  statements 


a 


b  <  e o  and  a  —  b  =  cq 


cannot  both  be  true.  This  contradiction  means  that  our  initial  assumption  that 
a  7^  b  is  unacceptable.  Therefore,  a  =  6,  and  the  indirect  proof  is  complete.  □ 


One  of  the  most  fundamental  skills  required  for  reading  and  writing  analysis 
proofs  is  the  ability  to  confidently  manipulate  the  quantifying  phrases  “for  all” 
and  “there  exists.”  Significantly  more  attention  will  be  given  to  this  issue  in 
many  upcoming  discussions. 


Induction 

One  final  trick  of  the  trade,  which  will  arise  with  some  frequency,  is  the  use  of 
induction  arguments.  Induction  is  used  in  conjunction  with  the  natural  numbers 
N  (or  sometimes  with  the  set  N  U  {0}).  The  fundamental  principle  behind 
induction  is  that  if  S  is  some  subset  of  N  with  the  property  that 

(i)  S  contains  1  and 

(ii)  whenever  S  contains  a  natural  number  n,  it  also  contains  n  +  1, 

then  it  must  be  that  S  =  N.  As  the  next  example  illustrates,  this  principle  can 
be  used  to  define  sequences  of  objects  as  well  as  to  prove  facts  about  them. 

Example  1.2.7.  Let  x\  =  1,  and  for  each  n  E  N  define 

*^n+ 1  —  (1  /2)^n  T  1. 

Using  this  rule,  we  can  compute  x 2  =  (1/2)  (1)  -f-  1  =  3/2,  X3  =  7/4,  and  it  is 
immediately  apparent  how  this  leads  to  a  definition  of  xn  for  all  n  E  N. 

The  sequence  just  defined  appears  at  the  outset  to  be  increasing.  For  the 
terms  computed,  we  have  x\  <  X2  <  £3.  Let’s  use  induction  to  prove  that  this 
trend  continues;  that  is,  let’s  show 

(2)  xn  ^  xn-\-i 


for  all  values  of  n  E  N. 

For  n  =  1,  x\  =  1  and  x 2  =3/2,  so  that  x\  <  x 2  is  clear.  Now,  we  want  to 
show  that 


if  we  have  xn  <  xn+i,  then  it  follows  that  £n+i  A  £n+2- 

Think  of  S  as  the  set  of  natural  numbers  for  which  the  claim  in  equation  (2) 
is  true.  We  have  shown  that  1  £  S.  We  are  now  interested  in  showing  that  if 
n  £  S,  then  n+1  £  S'  as  well.  Starting  from  the  induction  hypothesis  xn  <  xn+i, 
we  can  multiply  across  the  inequality  by  1/2  and  add  1  to  get 
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which  is  precisely  the  desired  conclusion  xn+i  <  xn+2-  By  induction,  the  claim 
is  proved  for  all  n  E  N. 

Any  discussion  about  why  induction  is  a  valid  argumentative  technique  im¬ 
mediately  opens  up  a  box  of  questions  about  how  we  understand  the  natural 
numbers.  Earlier,  in  Section  1.1,  we  avoided  this  issue  by  referencing  Kro- 
necker’s  famous  comment  that  the  natural  numbers  are  somehow  divinely  given. 
Although  we  will  not  improve  on  this  explanation  here,  it  should  be  pointed  out 
that  a  more  atheistic  and  mathematically  satisfying  approach  to  N  is  possible 
from  the  point  of  view  of  axiomatic  set  theory.  This  brings  us  back  to  a  recurring 
theme  of  this  chapter.  Pedagogically  speaking,  the  foundations  of  mathematics 
are  best  learned  and  appreciated  in  a  kind  of  reverse  order.  A  rigorous  study  of 
the  natural  numbers  and  the  theory  of  sets  is  certainly  recommended,  but  only 
after  we  have  an  understanding  of  the  subtleties  of  the  real  number  system.  It 
is  this  latter  topic  that  is  the  business  of  real  analysis. 

Exercises 

Exercise  1.2.1.  (a)  Prove  that  a/3  is  irrational.  Does  a  similar  argument 

work  to  show  a/6  is  irrational? 

(b)  Where  does  the  proof  of  Theorem  1.1.1  break  down  if  we  try  to  use  it  to 
prove  a/4  is  irrational? 

Exercise  1.2.2.  Show  that  there  is  no  rational  number  r  satisfying  2r  =  3. 

Exercise  1.2.3.  Decide  which  of  the  following  represent  true  statements  about 
the  nature  of  sets.  For  any  that  are  false,  provide  a  specific  example  where  the 
statement  in  question  does  not  hold. 

(a)  If  A\  D  A.2  D  As  3  d4  •  •  •  are  all  sets  containing  an  infinite  number  of 
elements,  then  the  intersection  Pl^Li  An  is  infinite  as  well. 

(b)  If  A i  D  A2  A  A3  D  A4  •  •  •  are  all  finite,  nonempty  sets  of  real  numbers, 

then  the  intersection  fXi=i  An  is  finite  and  nonempty. 

(c)  A  n  (B  U  C)  =  (A  n  B)  U  C. 

(d)  A  n  (B  n  c)  =  (A  n  B)  n  C. 

(e)  A  n  {B  u  c)  =  (A  n  B)  u  (A  n  C). 

Exercise  1.2.4.  Produce  an  infinite  collection  of  sets  Ai,  A2,  A3, . . .  with  the 
property  that  every  Ai  has  an  infinite  number  of  elements,  Ai  D  Aj  =  0  for  all 
i  ^  j,  and  (J^i  Ai  =  N- 

Exercise  1.2.5  (De  Morgan’s  Laws).  Let  A  and  B  be  subsets  of  R. 


(a)  If  x  G  (A  n  B)c,  explain  why  x  E  Ac  U  Bc.  This  shows  that  (A  P\  B)c  C 
ACUBC. 
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(b)  Prove  the  reverse  inclusion  (A  D  B)c  D  Ac  U  F>c,  and  conclude  that 
(A  n  B)c  =  Ac  U  BA 

(c)  Show  (AU  B)c  =  Ac  D  Bc  by  demonstrating  inclusion  both  ways. 

Exercise  1.2.6.  (a)  Verify  the  triangle  inequality  in  the  special  case  where 

a  and  b  have  the  same  sign. 

(b)  Find  an  efficient  proof  for  all  the  cases  at  once  by  first  demonstrating 
(a  A  b )2  <  (\a\  A  \b\)2. 

(c)  Prove  \a  —  b\  <  \a  —  c\  A  \c  —  d\  A  \d  —  b\  for  all  a,  6,  c,  and  d. 

(d)  Prove  \\a\  —  |b||  <  | a  —  b |.  (The  unremarkable  identity  a  =  a  —  b  A  b  may 
be  useful.) 

Exercise  1.2.7.  Given  a  function  /  and  a  subset  A  of  its  domain,  let  f(A) 
represent  the  range  of  /  over  the  set  A;  that  is,  f(A )  =  (f(x)  :  x  G  A}. 

(a)  Let  f(x)  =  x2 .  If  A  =  [0,2]  (the  closed  interval  {x  G  R  :  0  <  x  <  2}) 
and  B  =  [1,4],  find  f(A)  and  f(B).  Does  f(AnB)  =  f(A)  D  f(B)  in  this 
case?  Does  f(Al)B)  =  f{A)  U  f(B)l 

(b)  Find  two  sets  A  and  B  for  which  f(AnB)  ^  f(A)  D  f(B). 

(c)  Show  that,  for  an  arbitrary  function  g  :  R  R,  it  is  always  true  that 
g(A  D  B)  C  g(A)  D  g(B)  for  all  sets  4,5CR. 

(d)  Form  and  prove  a  conjecture  about  the  relationship  between  g(AuB)  and 
g(A)  U  g(B)  for  an  arbitrary  function  g. 

Exercise  1.2.8.  Here  are  two  important  definitions  related  to  a  function  /  : 
A  B.  The  function  /  is  one-to-one  (1-1)  if  cl\  ^  a 2  in  A  implies  that  f(ai)  7^ 
f(a 2)  in  B.  The  function  /  is  onto  if,  given  any  6  G  H,  it  is  possible  to  find  an 
element  a  E  A  for  which  /(a)  =  b. 

Give  an  example  of  each  or  state  that  the  request  is  impossible: 

(a)  /  :  N  N  that  is  1-1  but  not  onto. 

(b)  /  :  N  N  that  is  onto  but  not  1-1. 

(c)  /  :  N  Z  that  is  1-1  and  onto. 

Exercise  1.2.9.  Given  a  function  f  :  D  R  and  a  subset  B  C  R,  let  f-\B) 

be  the  set  of  all  points  from  the  domain  D  that  get  mapped  into  B;  that  is, 
f-1(B)  =  {xe  D  :  f(x)  G  B}.  This  set  is  called  the  preimage  of  B. 

(a)  Let  f(x)  =  x2 .  If  A  is  the  closed  interval  [0, 4]  and  B  is  the  closed  interval 
[—1, 1],  find  /-1(H)  and  f~l{B).  Does  f~l{A  D  B)  =  /-1(H)  D  /_1(H) 
in  this  case?  Does  f~\A  U  B)  =  /  “ 1  (.1)  U  f  ~\B)l 
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(b)  The  good  behavior  of  preimages  demonstrated  in  (a)  is  completely  general. 
Show  that  for  an  arbitrary  function  g  :  R  R,  it  is  always  true  that 
g~x{A  n  B)  =  g~1(A)  n  g~1(B)  and  g~x{A  U  B)  =  g~1(A)  U  g~1(B)  for 
all  sets  1,5CR. 

Exercise  1.2.10.  Decide  which  of  the  following  are  true  statements.  Provide  a 
short  justification  for  those  that  are  valid  and  a  counterexample  for  those  that 
are  not: 

(a)  Two  real  numbers  satisfy  a  <  b  if  and  only  if  a  <  b  +  e  for  every  e  >  0. 

(b)  Two  real  numbers  satisfy  a  <  b  if  a  <  b  +  e  for  every  e  >  0. 

(c)  Two  real  numbers  satisfy  a  <  b  if  and  only  if  a  <  b  +  e  for  every  e  >  0. 

Exercise  1.2.11.  Form  the  logical  negation  of  each  claim.  One  trivial  way  to 

do  this  is  to  simply  add  “It  is  not  the  case  that. . .  ”  in  front  of  each  assertion. 
To  make  this  interesting,  fashion  the  negation  into  a  positive  statement  that 
avoids  using  the  word  “not”  altogether.  In  each  case,  make  an  intuitive  guess 
as  to  whether  the  claim  or  its  negation  is  the  true  statement. 

(a)  For  all  real  numbers  satisfying  a  <  6,  there  exists  an  n  E  N  such  that 
a  +  1/n  <  b. 

(b)  There  exists  a  real  number  x  >  0  such  that  x  <  1/n  for  all  n  E  N. 

(c)  Between  every  two  distinct  real  numbers  there  is  a  rational  number. 
Exercise  1.2.12.  Let  yi  =  6,  and  for  each  n  E  N  define  yn+i  =  (2 yn  —  6) / 3. 

(a)  Use  induction  to  prove  that  the  sequence  satisfies  yn  >  —  6  for  all  n  E  N. 

(b)  Use  another  induction  argument  to  show  the  sequence  (2/1?  2/2?  2/3?  •  •  •)  is 
decreasing. 

Exercise  1.2.13.  For  this  exercise,  assume  Exercise  1.2.5  has  been  successfully 
completed. 

(a)  Show  how  induction  can  be  used  to  conclude  that 

(ii  u  i2  u  •  •  •  u  An)c  =  Al  n  ac2  n  •  •  •  n  Acn 

for  any  finite  n  E  N. 

(b)  It  is  tempting  to  appeal  to  induction  to  conclude 

/  00  \  c  00 

im<  r  1  - 

V-l  /  i= 1 

but  induction  does  not  apply  here.  Induction  is  used  to  prove  that  a 
particular  statement  holds  for  every  value  of  n  E  N,  but  this  does  not 
imply  the  validity  of  the  infinite  case.  To  illustrate  this  point,  find  an 
example  of  a  collection  of  sets  B\,  £>2,  P>3,  •  •  •  where  fjlLi  B%  7^  0  is  true 
for  every  n  E  N,  but  P£i  ^  7^  0  fails. 
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(c)  Nevertheless,  the  infinite  version  of  De  Morgan’s  Law  stated  in  (b)  is  a 
valid  statement.  Provide  a  proof  that  does  not  use  induction. 


1.3  The  Axiom  of  Completeness 

What  exactly  is  a  real  number?  In  Section  1.1,  we  got  as  far  as  saying  that 
the  set  R  of  real  numbers  is  an  extension  of  the  rational  numbers  Q  in  which 
there  are  no  holes  or  gaps.  We  want  every  length  along  the  number  line — such 
as  — to  correspond  to  a  real  number  and  vice  versa. 

We  are  going  to  improve  on  this  definition,  but  as  we  do  so,  it  is  important 
to  keep  in  mind  our  earlier  acknowledgment  that  whatever  precise  statements 
we  formulate  will  necessarily  rest  on  other  unproven  assumptions  or  undefined 
terms.  At  some  point,  we  must  draw  a  fine  and  confess  that  this  is  what  we  have 
decided  to  accept  as  a  reasonable  place  to  start.  Naturally,  there  is  some  debate 
about  where  this  line  should  be  drawn.  One  way  to  view  the  mathematics  of 
the  19th  and  20th  centuries  is  as  a  stalwart  attempt  to  move  this  line  further 
and  further  back  toward  some  unshakable  foundation.  The  majority  of  the 
material  covered  in  this  book  is  attributable  to  the  mathematicians  working  in 
the  early  and  middle  parts  of  the  1800s.  Augustin  Louis  Cauchy  (1789-1857), 
Bernhard  Bolzano  (1781-1848),  Niels  Henrik  Abel  (1802-1829),  Peter  Lejeune 
Dirichlet,  Karl  Weierstrass  (1815-1897),  and  Bernhard  Riemann  (1826-1866)  all 
figure  prominently  in  the  discovery  of  the  theorems  that  follow.  But  here  is  the 
interesting  point.  Nearly  all  of  this  work  was  done  using  intuitive  assumptions 
about  the  nature  of  R  quite  similar  to  our  own  informal  understanding  at  this 
point.  Eventually,  enough  scrutiny  was  directed  at  the  detailed  structure  of  R 
so  that,  in  the  1870s,  a  handful  of  ways  to  rigorously  construct  R  from  Q  were 
proposed. 

Following  this  historical  model,  our  own  rigorous  construction  of  R  from  Q 
is  postponed  until  Section  8.6.  By  this  point,  the  need  for  such  a  construction 
will  be  more  justified  and  easier  to  appreciate.  In  the  meantime,  we  have  many 
proofs  to  write,  so  it  is  important  to  lay  down,  as  explicitly  as  possible,  the 
assumptions  that  we  intend  to  make  about  the  real  numbers. 

An  Initial  Definition  for  R 

First,  R  is  a  set  containing  Q.  The  operations  of  addition  and  multiplication 
on  Q  extend  to  all  of  R  in  such  a  way  that  every  element  of  R  has  an  additive 
inverse  and  every  nonzero  element  of  R  has  a  multiplicative  inverse.  Echoing 
the  discussion  in  Section  1.1,  we  assume  R  is  a  field ,  meaning  that  addition 
and  multiplication  of  real  numbers  are  commutative,  associative,  and  the  dis¬ 
tributive  property  holds.  This  allows  us  to  perform  all  of  the  standard  algebraic 
manipulations  that  are  second  nature  to  us.  We  also  assume  that  the  familiar 
properties  of  the  ordering  on  Q  extend  to  all  of  R.  Thus,  for  example,  such 
deductions  as  “If  a  <  b  and  c  >  0,  then  ac  <  bcv  will  be  carried  out  freely 
without  much  comment.  To  summarize  the  situation  in  the  official  terminology 
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Figure  1.3:  Definition  of  sup  A  and  inf  A. 


of  the  subject,  we  assume  that  R  is  an  ordered  field ,  which  contains  Q  as  a 
subfield.  (A  rigorous  definition  of  “ordered  field”  is  presented  in  Section  8.6.) 

This  brings  us  to  the  final,  and  most  distinctive,  assumption  about  the  real 
number  system.  We  must  find  some  way  to  clearly  articulate  what  we  mean  by 
insisting  that  R  does  not  contain  the  gaps  that  permeate  Q.  Because  this  is  the 
defining  difference  between  the  rational  numbers  and  the  real  numbers,  we  will 
be  excessively  precise  about  how  we  phrase  this  assumption,  hereafter  referred 
to  as  the  Axiom  of  Completeness. 

Axiom  of  Completeness.  Every  nonempty  set  of  real  numbers  that  is  bounded 
above  has  a  least  upper  bound. 

Now,  what  exactly  does  this  mean? 

Least  Upper  Bounds  and  Greatest  Lower  Bounds 

Let’s  first  state  the  relevant  definitions,  and  then  look  at  some  examples. 

Definition  1.3.1.  A  set  A  C  R  is  bounded  above  if  there  exists  a  number  b  e  R 
such  that  a  <  b  for  all  a  e  A.  The  number  b  is  called  an  upper  bound  for  A. 

Similarly,  the  set  A  is  bounded  below  if  there  exists  a  lower  bound  l  e  R 
satisfying  l  <  a  for  every  a  E  A. 


Definition  1.3.2.  A  real  number  s  is  the  least  upper  bound  for  a  set  A  C  R  if 
it  meets  the  following  two  criteria: 

(i)  s  is  an  upper  bound  for  A; 

(ii)  if  b  is  any  upper  bound  for  A ,  then  s  <  b. 

The  least  upper  bound  is  also  frequently  called  the  supremum  of  the  set  A. 
Although  the  notation  s  =  lubA  is  sometimes  used,  we  will  always  write  s  = 
sup  A  for  the  least  upper  bound. 

The  greatest  lower  bound  or  infimum  for  A  is  defined  in  a  similar  way 
(Exercise  1.3.1)  and  is  denoted  by  inf  A  (Fig.  1.3). 

Although  a  set  can  have  a  host  of  upper  bounds,  it  can  have  only  one  least 
upper  bound.  If  s i  and  S2  are  both  least  upper  bounds  for  a  set  A ,  then 
by  property  (ii)  in  Definition  1.3.2  we  can  assert  s\  <  S2  and  S2  <  s\.  The 
conclusion  is  that  s\  =  S2  and  least  upper  bounds  are  unique. 
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Example  1.3.3.  Let 


The  set  A  is  bounded  above  and  below.  Successful  candidates  for  an  upper 
bound  include  3,  2,  and  3/2.  For  the  least  upper  bound,  we  claim  sup  A  =  1. 
To  argue  this  rigorously  using  Definition  1.3.2,  we  need  to  verify  that  properties 
(i)  and  (ii)  hold.  For  (i),  we  just  observe  that  1  >  1/n  for  all  choices  of  n  £  N. 
To  verify  (ii),  we  begin  by  assuming  we  are  in  possession  of  some  other  upper 
bound  b.  Because  1  £  A  and  b  is  an  upper  bound  for  A ,  we  must  have  1  <  b. 
This  is  precisely  what  property  (ii)  asks  us  to  show. 

Although  we  do  not  quite  have  the  tools  we  need  for  a  rigorous  proof  (see 
Theorem  1.4.2),  it  should  be  somewhat  apparent  that  inf  A  =  0. 

An  important  lesson  to  take  from  Example  1.3.3  is  that  sup  A  and  inf  A  may 
or  may  not  be  elements  of  the  set  A.  This  issue  is  tied  to  understanding  the 
crucial  difference  between  the  maximum  and  the  supremum  (or  the  minimum 
and  the  infimum)  of  a  given  set. 

Definition  1.3.4.  A  real  number  ao  is  a  maximum  of  the  set  A  if  ao  is  an 
element  of  A  and  ao  >  a  for  all  a  £  A.  Similarly,  a  number  a\  is  a  minimum  of 
A  if  a\  £  A  and  a\  <  a  for  every  a  £  A. 

Example  1.3.5.  To  belabor  the  point,  consider  the  open  interval 


(0,  2)  =  {x  £  R  :  0  <  x  <  2}, 


and  the  closed  interval 


[0,  2]  =  {x  £  R  :  0  <  x  <  2}. 

Both  sets  are  bounded  above  (and  below),  and  both  have  the  same  least  upper 
bound,  namely  2.  It  is  not  the  case,  however,  that  both  sets  have  a  maximum. 
A  maximum  is  a  specific  type  of  upper  bound  that  is  required  to  be  an  element 
of  the  set  in  question,  and  the  open  interval  (0,  2)  does  not  possess  such  an 
element.  Thus,  the  supremum  can  exist  and  not  be  a  maximum,  but  when  a 
maximum  exists,  then  it  is  also  the  supremum. 

Let’s  turn  our  attention  back  to  the  Axiom  of  Completeness.  Although  we 
can  see  now  that  not  every  nonempty  bounded  set  contains  a  maximum,  the 
Axiom  of  Completeness  asserts  that  every  such  set  does  have  a  least  upper 
bound.  We  are  not  going  to  prove  this.  An  axiom  in  mathematics  is  an  ac¬ 
cepted  assumption,  to  be  used  without  proof.  Preferably,  an  axiom  should  be 
an  elementary  statement  about  the  system  in  question  that  is  so  fundamental 
that  it  seems  to  need  no  justification.  Perhaps  the  Axiom  of  Completeness  fits 
this  description,  and  perhaps  it  does  not.  Before  deciding,  let’s  remind  ourselves 
why  it  is  not  a  valid  statement  about  Q. 
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Example  1.3.6.  Consider  again  the  set 

S  =  {r  G  Q  :  r2  <  2}, 

and  pretend  for  the  moment  that  our  world  consists  only  of  rational  numbers. 
The  set  S  is  certainly  bounded  above.  Taking  b  =  2  works,  as  does  b  =  3/2.  But 
notice  what  happens  as  we  go  in  search  of  the  least  upper  bound.  (It  may  be 
useful  here  to  know  that  the  decimal  expansion  for  begins  1.4142 . . .  .)  We 
might  try  b  =  142/100,  which  is  indeed  an  upper  bound,  but  then  we  discover 
that  b  =  1415/1000  is  an  upper  bound  that  is  smaller  still.  Is  there  a  smallest 
one? 

In  the  rational  numbers,  there  is  not.  In  the  real  numbers,  there  is.  Back 
in  R,  the  Axiom  of  Completeness  states  that  we  may  set  a  =  sup  S'  and  be 
confident  that  such  a  number  exists.  In  the  next  section,  we  will  prove  that 
a2  =  2.  But  according  to  Theorem  1.1.1,  this  implies  a  is  not  a  rational 
number.  If  we  are  restricting  our  attention  to  only  rational  numbers,  then  a 
is  not  an  allowable  option  for  sup  S,  and  the  search  for  a  least  upper  bound 
goes  on  indefinitely.  Whatever  rational  upper  bound  is  discovered,  it  is  always 
possible  to  find  one  smaller. 

The  tools  needed  to  carry  out  the  computations  described  in  Example  1.3.6 
depend  on  results  about  how  Q  and  N  fit  inside  of  R.  These  are  discussed  in  the 
next  section.  In  the  meantime,  it  is  possible  to  prove  some  intuitive  algebraic 
properties  of  least  upper  bounds  just  using  the  definition. 

Example  1.3.7.  Let  A  C  R  be  nonempty  and  bounded  above,  and  let  c  E  R. 
Define  the  set  c  +  A  by 


c  T  A  —  {c  a  i  a  £  A)-. 

Then  sup(c  +  A)  =  c  +  sup  A. 

To  properly  verify  this  we  focus  separately  on  each  part  of  Definition  1.3.2. 
Setting  s  =  sup  A,  we  see  that  a  <  s  for  all  a  E  A,  which  implies  c  +  a  <  c  +  s  for 
all  a  G  A.  Thus,  c  +  s  is  an  upper  bound  for  c  +  A  and  condition  (i)  is  verified. 

For  (ii),  let  b  be  an  arbitrary  upper  bound  for  c  +  A;  i.e.,  c  +  a  <  b  for  all 
a  G  A.  This  is  equivalent  to  a  <  b  —  c  for  all  a  E  A,  from  which  we  conclude  that 
b  —  c  is  an  upper  bound  for  A.  Because  s  is  the  least  upper  bound  of  A,  s  <  b  —  c, 
which  can  be  rewritten  as  c  +  s  <  b.  This  verifies  part  (ii)  of  Definition  1.3.2, 
and  we  conclude  sup(c  +  A)  =  c  +  sup  A. 

There  is  an  equivalent  and  useful  way  of  characterizing  least  upper  bounds. 
As  the  previous  example  illustrates,  Definition  1.3.2  of  the  supremum  has  two 
parts.  Part  (i)  says  that  sup  A  must  be  an  upper  bound,  and  part  (ii)  states 
that  it  must  be  the  smallest  one.  The  following  lemma  offers  an  alternative  way 
to  restate  part  (ii). 

Lemma  1.3.8.  Assume  s  E  R  is  an  upper  bound  for  a  set  A  C  R.  Then, 
s  =  sup  A  if  and  only  if,  for  every  choice  of  e  >  0,  there  exists  an  element  a  E  A 
satisfying  s  —  e  <  a. 


18 


Chapter  1.  The  Real  Numbers 


Proof.  Here  is  a  short  rephrasing  of  the  lemma:  Given  that  s  is  an  upper  bound, 
s  is  the  least  upper  bound  if  and  only  if  any  number  smaller  than  s  is  not  an 
upper  bound.  Putting  it  this  way  almost  qualifies  as  a  proof,  but  we  will  expand 
on  what  exactly  is  being  said  in  each  direction. 

(=>)  For  the  forward  direction,  we  assume  s  =  sup  A  and  consider  s  —  e,  where 
e  >  0  has  been  arbitrarily  chosen.  Because  s  —  e  <  s,  part  (ii)  of  Definition  1.3.2 
implies  that  s  —  e  is  not  an  upper  bound  for  A.  If  this  is  the  case,  then  there 
must  be  some  element  a  E  A  for  which  s  —  e  <  a  (because  otherwise  s  —  e  would 
be  an  upper  bound).  This  proves  the  lemma  in  one  direction. 

(<=)  Conversely,  assume  s  is  an  upper  bound  with  the  property  that  no 
matter  how  e  >  0  is  chosen,  s  —  e  is  no  longer  an  upper  bound  for  A.  Notice 
that  what  this  implies  is  that  if  b  is  any  number  less  than  8,  then  b  is  not  an 
upper  bound.  (Just  let  e  =  s  —  b.)  To  prove  that  s  =  sup  A,  we  must  verify  part 
(ii)  of  Definition  1.3.2.  (Read  it  again.)  Because  we  have  just  argued  that  any 
number  smaller  than  s  cannot  be  an  upper  bound,  it  follows  that  if  b  is  some 
other  upper  bound  for  A,  then  s  <  b.  □ 

It  is  certainly  the  case  that  all  of  our  conclusions  to  this  point  about  least 
upper  bounds  have  analogous  versions  for  greatest  lower  bounds.  The  Axiom  of 
Completeness  does  not  explicitly  assert  that  a  nonempty  set  bounded  below  has 
an  infimum,  but  this  is  because  we  do  not  need  to  assume  this  fact  as  part  of 
the  axiom.  Using  the  Axiom  of  Completeness,  there  are  several  ways  to  prove 
that  greatest  lower  bounds  exist  for  nonempty  bounded  sets.  One  such  proof  is 
explored  in  Exercise  1.3.3. 

Exercises 

Exercise  1.3.1.  (a)  Write  a  formal  definition  in  the  style  of  Definition  1.3.2 

for  the  infimum  or  greatest  lower  bound  of  a  set. 

(b)  Now,  state  and  prove  a  version  of  Lemma  1.3.8  for  greatest  lower  bounds. 

Exercise  1.3.2.  Give  an  example  of  each  of  the  following,  or  state  that  the 
request  is  impossible. 

(a)  A  set  B  with  inf  B  >  supU. 

(b)  A  finite  set  that  contains  its  infimum  but  not  its  supremum. 

(c)  A  bounded  subset  of  Q  that  contains  its  supremum  but  not  its  infimum. 

Exercise  1.3.3.  (a)  Let  A  be  nonempty  and  bounded  below,  and  define  B  = 

{b  G  R  :  b  is  a  lower  bound  for  A}.  Show  that  sup  B  =  inf  A. 

(b)  Use  (a)  to  explain  why  there  is  no  need  to  assert  that  greatest  lower  bounds 
exist  as  part  of  the  Axiom  of  Completeness. 

Exercise  1.3.4.  Let  Ai,A.2,A.3,...  be  a  collection  of  nonempty  sets,  each  of 
which  is  bounded  above. 
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(a)  Find  a  formula  for  sup(AiUA2).  Extend  this  to  sup  (Ufc=i  ^fc)- 

(b)  Consider  sup  (Ufcli  ^.fc)*  Does  the  formula  in  (a)  extend  to  the  infinite 
case? 

Exercise  1.3.5.  As  in  Example  1.3.7,  let  A  C  R  be  nonempty  and  bounded 
above,  and  let  c  E  R.  This  time  define  the  set  cA  =  {ca  :  a  E  A}. 

(a)  If  c  >  0,  show  that  sup(cA)  =  csup  A. 

(b)  Postulate  a  similar  type  of  statement  for  sup  (cA)  for  the  case  c  <  0. 

Exercise  1.3.6.  Given  sets  A  and  B ,  define  A-\-B  =  {a-fb  :  a  E  A  and  b  E  B}. 
Follow  these  steps  to  prove  that  if  A  and  B  are  nonempty  and  bounded  above 
then  sup(A  +  B)  =  sup  A  +  sup  B. 

(a)  Let  s  =  sup  A  and  t  =  sup  B.  Show  s  -ft  is  an  upper  bound  for  A  -f  B. 

(b)  Now  let  u  be  an  arbitrary  upper  bound  for  A  -f  R,  and  temporarily  fix 
a  E  A.  Show  t  <  u  —  a. 

(c)  Finally,  show  sup  (A  +  B)  =  s  -ft. 

(d)  Construct  another  proof  of  this  same  fact  using  Lemma  1.3.8. 

Exercise  1.3.7.  Prove  that  if  a  is  an  upper  bound  for  A,  and  if  a  is  also  an 
element  of  A,  then  it  must  be  that  a  =  sup  A. 

Exercise  1.3.8.  Compute,  without  proofs,  the  suprema  and  infima  (if  they 
exist)  of  the  following  sets: 

(a)  {m/n  :  m,  n  E  N  with  m  <  n}. 

(b)  {(— l)m/n  :  m,  n  E  N}. 

(c)  {n/ (3 n  -f  1)  :  n  E  N}. 

(d)  {m/ (m  +  n)  :  m,  n  E  N}. 

Exercise  1.3.9.  (a)  If  sup  A  <  sup  5,  show  that  there  exists  an  element 

b  E  B  that  is  an  upper  bound  for  A. 

(b)  Give  an  example  to  show  that  this  is  not  always  the  case  if  we  only  assume 
sup  A  <  sup  B. 

Exercise  1.3.10  (Cut  Property).  The  Cut  Property  of  the  real  numbers  is 
the  following: 

If  A  and  B  are  nonempty,  disjoint  sets  with  A  U  B  =  R  and  a  <  b  for  all 
a  E  A  and  b  E  5,  then  there  exists  c  E  R  such  that  x  <  c  whenever  x  E  A  and 
x  >  c  whenever  xgL 

(a)  Use  the  Axiom  of  Completeness  to  prove  the  Cut  Property. 
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(b)  Show  that  the  implication  goes  the  other  way;  that  is,  assume  R  possesses 
the  Cut  Property  and  let  E  be  a  nonempty  set  that  is  bounded  above. 
Prove  supE  exists. 

(c)  The  punchline  of  parts  (a)  and  (b)  is  that  the  Cut  Property  could  be  used 
in  place  of  the  Axiom  of  Completeness  as  the  fundamental  axiom  that 
distinguishes  the  real  numbers  from  the  rational  numbers.  To  drive  this 
point  home,  give  a  concrete  example  showing  that  the  Cut  Property  is  not 
a  valid  statement  when  R  is  replaced  by  Q. 

Exercise  1.3.11.  Decide  if  the  following  statements  about  suprema  and  infima 
are  true  or  false.  Give  a  short  proof  for  those  that  are  true.  For  any  that  are 
false,  supply  an  example  where  the  claim  in  question  does  not  appear  to  hold. 

(a)  If  A  and  B  are  nonempty,  bounded,  and  satisfy  A  C  B,  then  sup  A  < 
sup  B. 

(b)  If  sup  A  <  inf  B  for  sets  A  and  B,  then  there  exists  a  c  £  R  satisfying 
a  <  c  <  b  for  all  a  E  A  and  b  E  B. 

(c)  If  there  exists  a  c  E  R  satisfying  a  <  c  <  b  for  all  a  E  A  and  b  E  B,  then 
sup  A  <  inf  B. 

1.4  Consequences  of  Completeness 

The  first  application  of  the  Axiom  of  Completeness  is  a  result  that  may  look 
like  a  more  natural  way  to  mathematically  express  the  sentiment  that  the  real 
line  contains  no  gaps. 

Theorem  1.4.1  (Nested  Interval  Property).  For  each  n  E  N,  assume  we 
are  given  a  closed  interval  In  =  [an,bn\  =  {x  E  R  :  an  <  x  <  bn}.  Assume 
also  that  each  In  contains  In+i-  Then,  the  resulting  nested  sequence  of  closed 
intervals 

h  A  I2  A  I3  D  IA  D  •  •  • 
has  a  nonempty  intersection ;  that  is,  rc=1w0- 

Proof.  In  order  to  show  that  fXi=i  In  is  not  empty,  we  are  going  to  use  the 
Axiom  of  Completeness  (AoC)  to  produce  a  single  real  number  x  satisfying 
x  G  In  for  every  n  E  N.  Now,  AoC  is  a  statement  about  bounded  sets,  and  the 
one  we  want  to  consider  is  the  set 

A  =  {an  :  n  £  N} 

of  left-hand  endpoints  of  the  intervals. 


A={an'.  nCN} 


cl\  •  •  •  an  •  •  •  •  •  •  bn  •  •  •  63  62  b\ 
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Because  the  intervals  are  nested,  we  see  that  every  bn  serves  as  an  upper  bound 
for  A.  Thus,  we  are  justified  in  setting 


x 


sup  A. 


Now,  consider  a  particular  In  =  [an,bn\.  Because  x  is  an  upper  bound  for  A , 
we  have  an  <  x.  The  fact  that  each  bn  is  an  upper  bound  for  A  and  that  x  is 
the  least  upper  bound  implies  x  <bn. 

Altogether  then,  we  have  an  <  x  <  bni  which  means  x  E  In  for  every  choice 
of  n  E  N.  Hence,  x  E  and  the  intersection  is  not  empty.  □ 


The  Density  of  Q  in  R 

The  set  Q  is  an  extension  of  N,  and  R  in  turn  is  an  extension  of  Q.  The  next 
few  results  indicate  how  N  and  Q  sit  inside  of  R. 

Theorem  1.4.2  (Archimedean  Property),  (i)  Given  any  number  x  E  R, 
there  exists  an  n  E  N  satisfying  n  >  x. 

(ii)  Given  any  real  number  y  >  0,  there  exists  an  n  N  satisfying  1/n  <  y. 

Proof.  Part  (i)  of  the  proposition  states  that  N  is  not  bounded  above.  There 
has  never  been  any  doubt  about  the  truth  of  this,  and  it  could  be  reasonably 
argued  that  we  should  not  have  to  prove  it  at  all,  especially  in  light  of  the  fact 
that  we  have  decided  to  take  other  familiar  properties  of  N,  Z,  and  Q  as  given. 

The  counterargument  is  that  there  is  still  a  great  deal  of  mystery  about 
what  the  real  numbers  actually  are.  What  we  have  said  so  far  is  that  R  is  an 
extension  of  Q  that  maintains  the  algebraic  and  order  properties  of  the  rationals 
but  also  possesses  the  least  upper  bound  property  articulated  in  the  Axiom  of 
Completeness.  In  the  absence  of  any  other  information  about  R,  we  have  to 
consider  the  possibility  that  in  extending  Q  we  unwittingly  acquired  some  new 
numbers  that  are  upper  bounds  for  N.  In  fact,  as  disorienting  as  it  may  sound, 
there  are  ordered  field  extensions  of  Q  that  include  “numbers”  bigger  than  every 
natural  number.  Theorem  1.4.2  asserts  that  the  real  numbers  do  not  contain 
such  exotic  creatures.  The  Axiom  of  Completeness,  which  we  adopted  to  patch 
up  the  holes  in  Q,  carries  with  it  the  implication  that  N  is  an  unbounded  subset 
of  R. 

And  so  to  the  proof.  Assume,  for  contradiction,  that  N  is  bounded  above. 
By  the  Axiom  of  Completeness  (AoC),  N  should  then  have  a  least  upper  bound, 
and  we  can  set  a  =  supN.  If  we  consider  a  —  1,  then  we  no  longer  have  an 
upper  bound  (see  Lemma  1.3.8),  and  therefore  there  exists  an  n  E  N  satisfying 
a  —  1  <  n.  But  this  is  equivalent  to  a  <  n  +  1.  Because  n  +  1  E  N,  we  have 
a  contradiction  to  the  fact  that  a  is  supposed  to  be  an  upper  bound  for  N. 
(Notice  that  the  contradiction  here  depends  only  on  AoC  and  the  fact  that  N 
is  closed  under  addition.) 

Part  (ii)  follows  from  (i)  by  letting  x  =  l/y.  □ 

This  familiar  property  of  N  is  the  key  to  an  extremely  important  fact  about 
how  Q  fits  inside  of  R. 
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Theorem  1.4.3  (Density  of  Q  in  R).  For  every  two  real  numbers  a  and  b 
with  a  <  b,  there  exists  a  rational  number  r  satisfying  a  <  r  <  b. 

Proof.  A  rational  number  is  a  quotient  of  integers,  so  we  must  produce  m  £  Z 
and  n  £  N  so  that 

(1)  a  <  —  <  b. 

n 

The  first  step  is  to  choose  the  denominator  n  large  enough  so  that  consecutive 
increments  of  size  1/n  are  too  close  together  to  “step  over”  the  interval  (a,  b). 

1_  2l  3.  ...  m  —  1  m 

n  n  n  n  n 

- 1 - 1 - 1 - 1 - 1 - 1 - 1 - !-• - 1 - •-! - 1 - 1 - 

0  a  b 

Using  the  Archimedean  Property  (Theorem  1.4.2),  we  may  pick  n  £  N  large 
enough  so  that 

(2)  —  <  b  —  a. 

n 

Inequality  (1)  (which  we  are  trying  to  prove)  is  equivalent  to  na  <  m  <  nb. 
With  n  already  chosen,  the  idea  now  is  to  choose  m  to  be  the  smallest  integer 
greater  than  na.  In  other  words,  pick  m  £  Z  so  that 

(3)  (4) 

rri  —  1  <  na  <  rri. 

Now,  inequality  (4)  immediately  yields  a  <  m/n ,  which  is  half  of  the  battle. 
Keeping  in  mind  that  inequality  (2)  is  equivalent  to  a  <  b  —  1/n,  we  can  use  (3) 
to  write 


m  < 
< 


na  T  1 


nb. 


+  1 


Because  m  <  nb  implies  m/n  <  6,  we  have  a  <  m/n  <  6,  as  desired.  □ 

Theorem  1.4.3  is  paraphrased  by  saying  that  Q  is  dense  in  R.  Without 
working  too  hard,  we  can  use  this  result  to  show  that  the  irrational  numbers 
are  dense  in  R  as  well. 

Corollary  1.4.4.  Given  any  two  real  numbers  a  <  b,  there  exists  an  irrational 
number  t  satisfying  a  <  t  <  b. 


Proof.  Exercise  1.4.5. 


□ 


1.4.  Consequences  of  Completeness 
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The  Existence  of  Square  Roots 


It  is  time  to  tend  to  some  unfinished  business  left  over  from  Example  1.3.6  and 
this  chapter’s  opening  discussion. 

Theorem  1.4.5.  There  exists  a  real  number  a  e  R  satisfying  a2  =  2. 

Proof.  After  reviewing  Example  1.3.6,  consider  the  set 

T  =  {t  e  R  :  t2  <  2} 


and  set  a  =  supT.  We  are  going  to  prove  a2  =  2  by  ruling  out  the  possibilities 
a2  <  2  and  a2  >  2.  Keep  in  mind  that  there  are  two  parts  to  the  definition  of 
sup  T,  and  they  will  both  be  important.  (This  always  happens  when  a  supremum 
is  used  in  an  argument.)  The  strategy  is  to  demonstrate  that  a2  <  2  violates 
the  fact  that  a  is  an  upper  bound  for  T,  and  a2  >  2  violates  the  fact  that  it  is 
the  least  upper  bound. 

Let’s  first  see  what  happens  if  we  assume  a2  <  2.  In  search  of  an  element  of 
T  that  is  larger  than  a,  write 

o  2a  1 

OL  H - 1 - y 

n  nz 

9  2a  1 

a  H - 1 — 

n  n 

o  2a  ~\~  1 

a  + - . 

n 

But  now  assuming  a2  <2  gives  us  a  little  space  in  which  to  fit  the  {2a  +  l)/n 
term  and  keep  the  total  less  than  2.  Specifically,  choose  no  G  N  large  enough 
so  that 

1  2 -a2 

—  <  - . 

no  2a  T  1 

This  implies  {2a  +  l)/no  <  2  —  a2,  and  consequently  that 


a  + 


1 

n0 


<  a2  +  (2 


Thus,  a  +  1/no  G  T,  contradicting  the  fact  that  a  is  an  upper  bound  for  T.  We 
conclude  that  a2  <2  cannot  happen. 

Now,  what  about  the  case  a2  >  2?  This  time,  write 

(  1\2  2  2a  1 

I  a  —  —  1  —  a  — - 1 — — 

\  n  J  n  nz 

9  2a 

>  az - . 

n 

The  remainder  of  the  argument  is  requested  in  Exercise  1.4.7.  □ 


A  small  modification  of  this  proof  can  be  made  to  show  that  y/ x  exists  for 
any  x  >  0.  A  formula  for  expanding  {a  +  l/n)m  called  the  binomial  formula 
can  be  used  to  show  that  y/x  exists  for  arbitrary  values  of  m  E  N. 
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Exercises 


Exercise  1.4.1.  Recall  that  I  stands  for  the  set  of  irrational  numbers. 


(a)  Show  that  if  a,  b  G  Q,  then  ab  and  a  +  b  are  elements  of  Q  as  well. 

(b)  Show  that  if  a  G  Q  and  t  G  I,  then  a  + 1  G  I  and  at  G  I  as  long  as  a  ^  0. 

(c)  Part  (a)  can  be  summarized  by  saying  that  Q  is  closed  under  addition  and 
multiplication.  Is  I  closed  under  addition  and  multiplication?  Given  two 
irrational  numbers  s  and  £,  what  can  we  say  about  s  +  t  and  stl 

Exercise  1.4.2.  Let  A  C  R  be  nonempty  and  bounded  above,  and  let  s  G  R 
have  the  property  that  for  all  n  G  N,  s  +  —  is  an  upper  bound  for  A  and  s  —  — 
is  not  an  upper  bound  for  A.  Show  s  =  sup  A. 

Exercise  1.4.3.  Prove  that  n^Li(0>  Vn)  =  0*  Notice  that  this  demonstrates 
that  the  intervals  in  the  Nested  Interval  Property  must  be  closed  for  the  con¬ 
clusion  of  the  theorem  to  hold. 


Exercise  1.4.4.  Let  a  <  b  be  real  numbers  and  consider  the  set  T  =  Q  D  [a,  b] 
Show  sup  T  =  b. 


Exercise  1.4.5.  Using  Exercise  1.4.1,  supply  a  proof  for  Corollary  1.4.4  by 
considering  the  real  numbers  a  —  and  b  —  \/2. 

Exercise  1.4.6.  Recall  that  a  set  B  is  dense  in  R  if  an  element  of  B  can  be 
found  between  any  two  real  numbers  a  <  b.  Which  of  the  following  sets  are 
dense  in  R?  Take  p  G  Z  and  q  G  N  in  every  case. 

(a)  The  set  of  all  rational  numbers  p/q  with  q  <  10. 


(b)  The  set  of  all  rational  numbers  p/q  with  q  a  power  of  2. 


(c)  The  set  of  all  rational  numbers  p/q  with  10  |_p|  >  q. 


Exercise  1.4.7.  Finish  the  proof  of  Theorem  1.4.5  by  showing  that  the 
assumption  a2  >  2  leads  to  a  contradiction  of  the  fact  that  a  =  supT. 


Exercise  1.4.8.  Give  an  example  of  each  or  state  that  the  request  is  impossible. 
When  a  request  is  impossible,  provide  a  compelling  argument  for  why  this  is 
the  case. 


(a)  Two  sets  A  and  B  with  A  n  B  =  0,  sup  A  =  sup  5,  sup  A  ^  A  and 
sup  B  ^  B. 

(b)  A  sequence  of  nested  open  intervals  Ji  5  J2  5  j3  D  •  •  •  with  rr=  :  i  Jn 
nonempty  but  containing  only  a  finite  number  of  elements. 

(c)  A  sequence  of  nested  unbounded  closed  intervals  li  5  U  4  G  4  ••• 
with  |XLi  An  =  0.  (An  unbounded  closed  interval  has  the  form  [a,  oo)  = 
{x  G  R  :  x  >  a}.) 

(d)  A  sequence  of  closed  bounded  (not  necessarily  nested)  intervals  I\ ,  I2 , 
/3, . . .  with  the  property  that  4  7^0  f°r  all  N  G  N,  but  H^Li  Ai  =  0* 


1.5.  Cardinality 
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1.5  Cardinality 

The  applications  of  the  Axiom  of  Completeness  to  this  point  have  basically 
served  to  restore  our  confidence  in  properties  we  already  felt  we  knew  about  the 
real  number  system.  One  final  consequence  of  completeness  that  we  are  about 
to  explore  is  of  a  very  different  nature  and,  on  its  own,  represents  an  astounding 
intellectual  discovery.  The  traditional  way  that  mathematics  gets  done  is  by 
one  mathematician  modifying  and  expanding  on  the  work  of  those  who  came 
before.  This  model  does  not  seem  to  apply  to  Georg  Cantor  (1845-1918),  at 
least  with  regard  to  his  work  on  the  theory  of  infinite  sets. 

At  the  moment,  we  have  an  image  of  R  as  consisting  of  rational  and  irrational 
numbers,  continuously  packed  together  along  the  real  line.  We  have  seen  that 
both  Q  and  I  (the  set  of  irrationals)  are  dense  in  R,  meaning  that  in  every 
interval  (a,  b )  there  exist  rational  and  irrational  numbers  alike.  Mentally,  there 
is  a  temptation  to  think  of  Q  and  I  as  being  intricately  mixed  together  in  equal 
proportions,  but  this  turns  out  not  to  be  the  case.  In  a  way  that  Cantor  made 
precise,  the  irrational  numbers  far  outnumber  the  rational  numbers  in  making 
up  the  real  line. 

1—1  Correspondence 

The  term  cardinality  is  used  in  mathematics  to  refer  to  the  size  of  a  set.  The 
cardinalities  of  finite  sets  can  be  compared  simply  by  attaching  a  natural  number 
to  each  set.  The  set  of  Snow  White’s  dwarfs  is  smaller  than  the  set  of  United 
States  Supreme  Court  Justices  because  7  is  less  than  9.  But  how  might  we 
draw  this  same  conclusion  without  referring  to  any  numbers?  Cantor’s  idea  was 
to  attempt  to  put  the  sets  into  a  1-1  correspondence  with  each  other.  There 
are  fewer  dwarfs  than  Justices  because,  if  the  dwarfs  were  all  simultaneously 
appointed  to  the  bench,  there  would  still  be  two  empty  chairs  to  fill.  On  the 
other  hand,  the  cardinality  of  the  Supreme  Court  is  the  same  as  the  cardinality 
of  the  set  of  fielders  on  a  baseball  team.  This  is  because,  when  the  judges  take 
the  field,  it  is  possible  to  arrange  them  so  that  there  is  exactly  one  judge  at 
every  position. 

The  advantage  of  this  method  of  comparing  the  sizes  of  sets  is  that  it  works 
equally  well  on  sets  that  are  infinite. 

Definition  1.5.1.  A  function  /  :  A  — >  B  is  one-to-one  (1-1)  if  a\  ^  a 2  in  A 
implies  that  f(ai)  7^  f(a 2)  in  B.  The  function  /  is  onto  if,  given  any  5  £  U,  it 
is  possible  to  find  an  element  a  £  A  for  which  /(a)  =  b. 

A  function  /  :  A  — >  B  that  is  both  1-1  and  onto  provides  us  with  exactly 
what  we  mean  by  a  1-1  correspondence  between  two  sets.  The  property  of 
being  1-1  means  that  no  two  elements  of  A  correspond  to  the  same  element  of 
B  (no  two  judges  are  playing  the  same  position),  and  the  property  of  being  onto 
ensures  that  every  element  of  B  corresponds  to  something  in  A  (there  is  a  judge 
at  every  position). 
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Definition  1.5.2.  The  set  A  has  the  same  cardinality  as  B  if  there  exists 
/  :  A  B  that  is  1-1  and  onto.  In  this  case,  we  write  A  ~  B. 

Example  1.5.3.  (i)  If  we  let  E  =  {2,4,6,...}  be  the  set  of  even  natural 

numbers,  then  we  can  show  N  ~  E.  To  see  why,  let  /  :  N  E  be  given 
by  f(n)  =  2 n. 

N:  1  2  3  4  •••  n 

X  X  X  ■■■ 

E  :  2  4  6  8  •  •  •  2  n 


It  is  certainly  true  that  E  is  a  proper  subset  of  N,  and  for  this  reason  it 
may  seem  logical  to  say  that  E  is  a  “smaller”  set  than  N.  This  is  one 
way  to  look  at  it,  but  it  represents  a  point  of  view  that  is  heavily  biased 
from  an  overexposure  to  finite  sets.  The  definition  of  cardinality  is  quite 
specific,  and  from  this  point  of  view  E  and  N  are  equivalent. 

(ii)  To  make  this  point  again,  note  that  although  N  is  contained  in  Z  as  a 
proper  subset,  we  can  show  N  ~  Z.  This  time  let 

f(n\  =  I  “  1 )/2  if  U  iS  °dd 

J  '  '  {  —  n/2  if  n  is  even. 

The  important  details  to  verify  are  that  /  does  not  map  any  two  natural 
numbers  to  the  same  element  of  Z  (/  is  1-1)  and  that  every  element  of  Z 
gets  “hit”  by  something  in  N  (/  is  onto). 


N  :  1 


4 


6  7 


/K  /K 

t  X"  ■X  X"  ■X  X" 

Z  :  0  -1  1  -2  2  -3  3 


Example  1.5.4.  A  little  calculus  (which  we  will  not  supply)  shows  that  the 
function  f(x)  =  x/{pc 2  —  1)  takes  the  interval  (  —  1, 1)  onto  R  in  a  1-1  fashion 
(Fig.  1.4).  Thus  (—1,1)  ~  R.  In  fact,  (a,  b)  ~  R  for  any  interval  (a,  b). 


Countable  Sets 

Definition  1.5.5.  A  set  A  is  countable  if  N  ^  A.  An  infinite  set  that  is  not 
countable  is  called  an  uncountable  set. 


1.5.  Cardinality 
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Figure  1.4:  (—1, 1)  ~  R  using  f(x)  =  xj ( x 2  —  1). 


From  Example  1.5.3,  we  see  that  both  E  and  Z  are  countable  sets.  Putting 
a  set  into  a  1-1  correspondence  with  N,  in  effect,  means  putting  all  of  the 
elements  into  an  infinitely  long  list  or  sequence.  Looking  at  Example  1.5.3,  we 
can  see  that  this  was  quite  easy  to  do  for  E  and  required  only  a  modest  bit 
of  shuffling  for  the  set  Z.  A  natural  question  arises  as  to  whether  all  infinite 
sets  are  countable.  Given  some  infinite  set  such  as  Q  or  R,  it  might  seem  as 
though,  with  enough  cleverness,  we  should  be  able  to  fit  all  the  elements  of  our 
set  into  a  single  list  (i.e.,  into  a  correspondence  with  N).  After  all,  this  list  is 
infinitely  long  so  there  should  be  plenty  of  room.  But  alas,  as  Hardy  remarks, 
“[The  mathematician’s]  subject  is  the  most  curious  of  all — there  is  none  in  which 
truth  plays  such  odd  pranks.” 

Theorem  1.5.6.  (i)  The  set  Q  is  countable,  (ii)  The  set  R  is  uncountable. 
Proof,  (i)  Set  A\  =  {0}  and  for  each  n  >2,  let  An  be  the  set  given  by 


where  p,  q  E  N  are  in  lowest  terms  with  p  +  q  =  n 


The  first  few  of  these  sets  look  like 


A\  —  {0}  ,  A2 


1  -1  2  —2 j 
2’  ~2~’  1  ’  T~  J  ’ 


\  3  3  1  1  / 


and  A5 


1-12-23-34 
4’  X’  3’  IT’  2’  ~2~’  1’ 


The  crucial  observation  is  that  each  An  is  finite  and  every  rational  number 
appears  in  exactly  one  of  these  sets.  Our  1-1  correspondence  with  N  is  then 
achieved  by  consecutively  listing  the  elements  in  each  An. 
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N  :  1  2  3  4  5  6 


7  8  9  10  11  12 


t  t  t 


t  t  t  t 


t 


t 


t  t  t 


Q  :  0  j 


11  12  2 
12  2  1  1 


1 

3 


1 

3 


3 

1 


3 

1 


A  i  A  2 


✓  ^ 


A  4 


y 


1 

4 


Admittedly,  writing  an  explicit  formula  for  this  correspondence  would  be  an 
awkward  task,  and  attempting  to  do  so  is  not  the  best  use  of  time.  What 
matters  is  that  we  see  why  every  rational  number  appears  in  the  correspondence 
exactly  once.  Given,  say,  22/7,  we  have  that  22/7  E  A29.  Because  the  set  of 
elements  in  Ai, . . . ,  A28  is  finite,  we  can  be  confident  that  22/7  eventually  gets 
included  in  the  sequence.  The  fact  that  this  line  of  reasoning  applies  to  any 
rational  number  p/q  is  our  proof  that  the  correspondence  is  onto.  To  verify 
that  it  is  1-1,  we  observe  that  the  sets  An  were  constructed  to  be  disjoint  so 
that  no  rational  number  appears  twice.  This  completes  the  proof  of  (i). 

(ii)  The  second  statement  of  Theorem  1.5.6  is  the  truly  unexpected  part, 
and  its  proof  is  done  by  contradiction.  Assume  that  there  does  exist  a  1-1, 
onto  function  /  :  N  R.  Again,  what  this  suggests  is  that  it  is  possible  to 
enumerate  the  elements  of  R.  If  we  let  x\  =  /( 1),  X2  =  /( 2),  and  so  on,  then 
our  assumption  that  /  is  onto  means  that  we  can  write 

(1)  R  =  {xi1X2,X31X4,  .  .  .} 

and  be  confident  that  every  real  number  appears  somewhere  on  the  list.  We 
will  now  use  the  Nested  Interval  Property  (Theorem  1.4.1)  to  produce  a  real 
number  that  is  not  there. 

Let  I\  be  a  closed  interval  that  does  not  contain  x\.  Next,  let  I2  be  a  closed 
interval,  contained  in  A,  which  does  not  contain  X2-  The  existence  of  such  an 
I2  is  easy  to  verify.  Certainly  I\  contains  two  smaller  disjoint  closed  intervals, 
and  X2  can  only  be  in  one  of  these.  In  general,  given  an  interval  7n,  construct 
7n+ 1  to  satisfy 

(i)  In+ 1  C  /„  and 

(ii)  xn_{_i  ^  In- |_i . 


n 


■s 


+  1 


X 


n 


We  now  consider  the  intersection  fj^Li  If  xn0  some  real  number  from  the 
list  in  (1),  then  we  have  xno  </  7no,  and  it  follows  that 


x 


no 


£  fW 

n= 1 
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Now,  we  are  assuming  that  the  list  in  (1)  contains  every  real  number,  and  this 
leads  to  the  conclusion  that 

oo 

n  In  =  0. 

n— 1 

However,  the  Nested  Interval  Property  (NIP)  asserts  that  Dn=i  ^ n  ^ 

NIP,  there  is  at  least  one  x  E  H^Li  1 n  that,  consequently,  cannot  be  on  the  list 
in  (1).  This  contradiction  means  that  such  an  enumeration  of  R  is  impossible, 
and  we  conclude  that  R  is  an  uncountable  set.  □ 

What  exactly  should  we  make  of  this  discovery?  It  is  an  important  exercise 
to  show  that  any  subset  of  a  countable  set  must  be  either  countable  or  finite. 
This  should  not  be  too  surprising.  If  a  set  can  be  arranged  into  a  single  list,  then 
deleting  some  elements  from  this  list  results  in  another  (shorter,  and  potentially 
terminating)  list.  This  means  that  countable  sets  are  the  smallest  type  of  infinite 
set.  Anything  smaller  is  either  still  countable  or  finite. 

The  force  of  Theorem  1.5.6  is  that  the  cardinality  of  R  is,  informally  speak¬ 
ing,  a  larger  type  of  infinity.  The  real  numbers  so  outnumber  the  natural  num¬ 
bers  that  there  is  no  way  to  map  N  onto  R.  No  matter  how  we  attempt  this, 
there  are  always  real  numbers  to  spare.  The  set  Q,  on  the  other  hand,  is  count¬ 
able.  As  far  as  infinite  sets  are  concerned,  this  is  as  small  as  it  gets.  What  does 
this  imply  about  the  set  I  of  irrational  numbers?  By  imitating  the  demonstra¬ 
tion  that  N  ~  Z,  we  can  prove  that  the  union  of  two  countable  sets  must  be 
countable.  Because  R  =  Q  U  I,  it  follows  that  I  cannot  be  countable  because 
otherwise  R  would  be.  The  inescapable  conclusion  is  that,  despite  the  fact  that 
we  have  encountered  so  few  of  them,  the  irrational  numbers  form  a  far  greater 
subset  of  R  than  Q. 

The  properties  of  countable  sets  described  in  this  discussion  are  useful  for  a 
few  exercises  in  upcoming  chapters.  For  easier  reference,  we  state  them  as  some 
final  propositions  and  outline  their  proofs  in  the  exercises  that  follow. 

Theorem  1.5.7.  If  A  C  B  and  B  is  countable,  then  A  is  either  countable  or 
finite. 

Theorem  1.5.8.  (i)  If  A \  ,  A2, . . .  Am  are  each  countable  sets,  then  the  union 

A\  U  A2  U  •  •  •  U  Am  is  countable. 

(ii)  If  An  is  a  countable  set  for  each  n  E  N,  then  IXL 1 V,  is  countable. 

Exercises 

Exercise  1.5.1.  Finish  the  following  proof  for  Theorem  1.5.7. 

Assume  B  is  a  countable  set.  Thus,  there  exists  /  :  N  B,  which  is  1-1 
and  onto.  Let  A  C  B  be  an  infinite  subset  of  B.  We  must  show  that  A  is 
countable. 

Let  n\  =  min{n  E  N  :  f(n)  E  A}.  As  a  start  to  a  definition  of  g  :  N  A, 
set  g(  1)  =  f(n\).  Show  how  to  inductively  continue  this  process  to  produce  a 
1-1  function  g  from  N  onto  A. 
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Exercise  1.5.2.  Review  the  proof  of  Theorem  1.5.6,  part  (ii)  showing  that  R 
is  uncountable,  and  then  find  the  flaw  in  the  following  erroneous  proof  that  Q 
is  uncountable: 

Assume,  for  contradiction,  that  Q  is  countable.  Thus  we  can  write  Q  = 
{ri,  7*2, 7*3, . . .}  and,  as  before,  construct  a  nested  sequence  of  closed  intervals 
with  rn  ^  In-  Our  construction  implies  H^Li  =  0  while  NIP  implies  H^Li  1 n  7^ 
0.  This  contradiction  implies  Q  must  therefore  be  uncountable. 

Exercise  1.5.3.  Use  the  following  outline  to  supply  proofs  for  the  statements 
in  Theorem  1.5.8. 

(a)  First,  prove  statement  (i)  for  two  countable  sets,  A\  and  A 2.  Exam¬ 
ple  1.5.3  (ii)  may  be  a  useful  reference.  Some  technicalities  can  be  avoided 
by  first  replacing  A2  with  the  set  B2  =  A2\A\  =  {x  E  A2  :  x  £  A\}.  The 
point  of  this  is  that  the  union  A\  U  B2  is  equal  to  A\  U  A2  and  the  sets 
Ai  and  B2  are  disjoint.  (What  happens  if  B2  is  finite?) 

Now,  explain  how  the  more  general  statement  in  (i)  follows. 

(b)  Explain  why  induction  cannot  be  used  to  prove  part  (ii)  of  Theorem  E5.8 
from  part  (i). 

(c)  Show  how  arranging  N  into  the  two-dimensional  array 

1  3  6  10  15  ••• 

2  5  9  14  •  •  • 

4  8  13  •  •  • 

7  12 

11  ••• 


leads  to  a  proof  of  Theorem  E5.8  (ii). 

Exercise  1.5.4.  (a)  Show  (a,  b)  ~  R  for  any  interval  (a,  b). 

(b)  Show  that  an  unbounded  interval  like  (a,  00)  =  {x  :  x  >  a}  has  the  same 
cardinality  as  R  as  well. 

(c)  Using  open  intervals  makes  it  more  convenient  to  produce  the  required 
1-1,  onto  functions,  but  it  is  not  really  necessary.  Show  that  [0, 1)  no  (0,1) 
by  exhibiting  a  1-1  onto  function  between  the  two  sets. 

Exercise  1.5.5.  (a)  Why  is  A  ~  A  for  every  set  A? 

(b)  Given  sets  A  and  B,  explain  why  A  ~  B  is  equivalent  to  asserting  B  ~  A. 

(c)  For  three  sets  A,  B ,  and  C,  show  that  A  ~  B  and  B  ~  C  implies  A  ~  C. 
These  three  properties  are  what  is  meant  by  saying  that  ~  is  an  equivalence 
relation. 
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Exercise  1.5.6.  (a)  Give  an  example  of  a  countable  collection  of  disjoint 

open  intervals. 

(b)  Give  an  example  of  an  uncountable  collection  of  disjoint  open  intervals, 
or  argue  that  no  such  collection  exists. 

Exercise  1.5.7.  Consider  the  open  interval  (0,1),  and  let  S  be  the  set  of  points 
in  the  open  unit  square;  that  is,  S  =  {(x,  y)  :  0  <  x,y  <  1}. 

(a)  Find  a  1-1  function  that  maps  (0, 1)  into,  but  not  necessarily  onto,  S. 
(This  is  easy.) 

(b)  Use  the  fact  that  every  real  number  has  a  decimal  expansion  to  produce 
a  1-1  function  that  maps  S  into  (0, 1).  Discuss  whether  the  formulated 
function  is  onto.  (Keep  in  mind  that  any  terminating  decimal  expansion 
such  as  .235  represents  the  same  real  number  as  .234999 . . .  .) 

The  Schroder-Bernstein  Theorem  discussed  in  Exercise  1.5.11  can  now  be 
applied  to  conclude  that  (0, 1)  ~  S. 

Exercise  1.5.8.  Let  B  be  a  set  of  positive  real  numbers  with  the  property  that 
adding  together  any  finite  subset  of  elements  from  B  always  gives  a  sum  of  2  or 
less.  Show  B  must  be  finite  or  countable. 

Exercise  1.5.9.  A  real  number  x  E  R  is  called  algebraic  if  there  exist  integers 
a0,ai,a2, . . . ,  an  £  Z,  not  all  zero,  such  that 

anxn  +  an-ixn  T  •  •  •  T  a\x  uq  =  0. 


Said  another  way,  a  real  number  is  algebraic  if  it  is  the  root  of  a  polynomial  with 
integer  coefficients.  Real  numbers  that  are  not  algebraic  are  called  transcenden¬ 
tal  numbers.  Reread  the  last  paragraph  of  Section  1.1.  The  final  question  posed 
here  is  closely  related  to  the  question  of  whether  or  not  transcendental  numbers 
exist. 

(a)  Show  that  \/2,  v^2,  and  y/3  +  y/2  are  algebraic. 

(b)  Fix  n  E  N,  and  let  An  be  the  algebraic  numbers  obtained  as  roots  of  poly¬ 
nomials  with  integer  coefficients  that  have  degree  n.  Using  the  fact  that 
every  polynomial  has  a  finite  number  of  roots,  show  that  An  is  countable. 

(c)  Now,  argue  that  the  set  of  all  algebraic  numbers  is  countable.  What  may 
we  conclude  about  the  set  of  transcendental  numbers? 


Exercise  1.5.10.  (a)  Let  C  C  [0, 1]  be  uncountable.  Show  that  there  exists 

a  E  (0, 1)  such  that  C  D  [a,  1]  is  uncountable. 

(b)  Now  let  A  be  the  set  of  all  a  E  (0, 1)  such  that  C  D  [a,  1]  is  uncountable, 
and  set  a  =  sup  A.  Is  C  D  [a,  1]  an  uncountable  set? 


(c)  Does  the  statement  in  (a)  remain  true  if  “uncountable”  is  replaced  by 
“infinite”  ? 
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Exercise  1.5.11  (Schroder— Bernstein  Theorem).  Assume  there  exists  a 
1-1  function  /  :  X  — )►  Y  and  another  1-1  function  g  :  Y  X.  Follow  the  steps 
to  show  that  there  exists  a  1-1,  onto  function  h  :  X  Y  and  hence  X  ~  Y. 
The  strategy  is  to  partition  X  and  Y  into  components 

X  =  AU  A'  and  Y  =  B  U  B' 

with  A  n  A'  =  0  and  B  D  B'  =  0,  in  such  a  way  that  /  maps  A  onto  B ,  and  g 
maps  B'  onto  A! . 

(a)  Explain  how  achieving  this  would  lead  to  a  proof  that  X  ~  Y . 

(b)  Set  A\  —  X\g(Y)  =  {x  E  X  :  x  ^  g(Y)}  (what  happens  if  A\  =  0?)  and 
inductively  define  a  sequence  of  sets  by  letting  An+i  =  g(f(An)).  Show 
that  {An  :  n  E  N}  is  a  pairwise  disjoint  collection  of  subsets  of  A,  while 
{f(An)  :  n  G  N}  is  a  similar  collection  in  Y. 

(c)  Let  A  =  U~=i  An  and  B  =  IJ^Li  f(An).  Show  that  /  maps  A  onto  B. 

(d)  Let  A'  =  X\A  and  B'  =  Y\B.  Show  g  maps  B'  onto  A' . 

1.6  Cantor’s  Theorem 

Cantor’s  work  into  the  theory  of  infinite  sets  extends  far  beyond  the  conclusions 
of  Theorem  1.5.6.  Although  initially  resisted,  his  creative  and  relentless  assault 
in  this  area  eventually  produced  a  revolution  in  set  theory  and  a  paradigm  shift 
in  the  way  mathematicians  came  to  understand  the  infinite. 

Cantor’s  Diagonalization  Method 

Cantor  published  his  discovery  that  R  is  uncountable  in  1874.  Although  it 
has  some  modern  polish  on  it,  the  argument  presented  in  Theorem  1.5.6  (ii) 
is  actually  quite  similar  to  the  one  Cantor  originally  found.  In  1891,  Cantor 
offered  another  proof  of  this  same  fact  that  is  startling  in  its  simplicity.  It 
relies  on  decimal  representations  for  real  numbers,  which  we  will  accept  and  use 
without  any  formal  definitions. 

Theorem  1.6.1.  The  open  interval  (0, 1)  =  {x  E  R  :  0  <  x  <  1}  is 
uncountable. 

Exercise  1.6.1.  Show  that  (0, 1)  is  uncountable  if  and  only  if  R  is  uncountable. 
This  shows  that  Theorem  1.6.1  is  equivalent  to  Theorem  1.5.6. 

Proof.  As  with  Theorem  1.5.6,  we  proceed  by  contradiction  and  assume  that 
there  does  exist  a  function  /  :  N  (0, 1)  that  is  1-1  and  onto.  For  each  m  E  N, 
f{m)  is  a  real  number  between  0  and  1,  and  we  represent  it  using  the  decimal 
notation 

f(m)  =  .  Q,  rn  1  &  rri  2  rn  3  ^  rr  i  4  ^  rr  i  5  •  •  •  • 
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What  is  meant  here  is  that  for  each  m,  n  E  N,  amn  is  the  digit  from  the  set 
{0, 1,  2, . . . ,  9}  that  represents  the  nth  digit  in  the  decimal  expansion  of  f(m). 
The  1-1  correspondence  between  N  and  (0, 1)  can  be  summarized  in  the  doubly 
indexed  array 


N  (0,1) 


1 

v- 

-7 

/( 1) 

=  0-11 

a  12 

&13 

<214 

&15 

a  16 

2 

v- 

-7 

/( 7 

=  -&21 

«22 

a  23 

&24 

&25 

&26 

3 

v- 

-7 

/( 3) 

=  -O' 31 

&32 

a33 

&34 

&35 

&36 

4 

v- 

-7 

/( 4) 

=  .<241 

&42 

<243 

GL44 

&45 

<246 

5 

v- 

-7 

/( 5) 

=  7i,5i 

&52 

&53 

&54 

«55 

^56 

6 

v- 

-7 

/( 6) 

=  72  61 

&62 

&63 

&64 

&65 

«66 

The  key  assumption  about  this  correspondence  is  that  every  real  number  in 
(0, 1)  is  assumed  to  appear  somewhere  on  the  list. 

Now  for  the  pearl  of  the  argument.  Define  a  real  number  x  E  (0, 1)  with  the 
decimal  expansion  x  =  .61626364  •  •  •  using  the  rule 


f  2  if  ann  7^  2 
\  3  if  ann  =  2. 


Let’s  be  clear  about  this.  To  compute  the  digit  61,  we  look  at  the  digit  an  in 
the  upper  left-hand  corner  of  the  array.  If  an  =  2,  then  we  choose  61  =  3; 
otherwise,  we  set  61  =  2. 

Exercise  1.6.2.  (a)  Explain  why  the  real  number  x  =  .61626364  . . .  cannot 

be  /( 1). 

(b)  Now,  explain  why  x  7^  /( 2),  and  in  general  why  x  7^  f{n)  for  any  n  E  N. 

(c)  Point  out  the  contradiction  that  arises  from  these  observations  and  con¬ 
clude  that  (0, 1)  is  uncountable.  □ 

Exercise  1.6.3.  Supply  rebuttals  to  the  following  complaints  about  the  proof 
of  Theorem  1.6.1. 


(a)  Every  rational  number  has  a  decimal  expansion,  so  we  could  apply  this 
same  argument  to  show  that  the  set  of  rational  numbers  between  0  and  1 
is  uncountable.  However,  because  we  know  that  any  subset  of  Q  must  be 
countable,  the  proof  of  Theorem  E6.1  must  be  flawed. 

(b)  Some  numbers  have  two  different  decimal  representations.  Specifically, 
any  decimal  expansion  that  terminates  can  also  be  written  with  repeating 
9’s.  For  instance,  1/2  can  be  written  as  .5  or  as  .4999....  Doesn’t  this 
cause  some  problems? 
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Exercise  1.6.4.  Let  S  be  the  set  consisting  of  all  sequences  of  0’s  and  l’s. 
Observe  that  S  is  not  a  particular  sequence,  but  rather  a  large  set  whose  ele¬ 
ments  are  sequences;  namely, 

S  =  {(ai,  <22,  <23, . . .)  :  an  =  0  or  1}. 

As  an  example,  the  sequence  (1,  0, 1,  0, 1,  0, 1,  0, . . .)  is  an  element  of  5,  as  is  the 
sequence  (1, 1, 1, 1, 1,1,.. .). 

Give  a  rigorous  argument  showing  that  S  is  uncountable. 

Having  distinguished  between  the  countable  infinity  of  N  and  the  uncount¬ 
able  infinity  of  R,  a  new  question  that  occupied  Cantor  was  whether  or  not  there 
existed  an  infinity  “above”  that  of  R.  This  is  logically  treacherous  territory. 
The  same  care  we  gave  to  defining  the  relationship  “has  the  same  cardinality 
as”  needs  to  be  given  to  defining  relationships  such  as  “has  cardinality  greater 
than”  or  “has  cardinality  less  than  or  equal  to.”  Nevertheless,  without  getting 
too  weighed  down  with  formal  definitions,  one  gets  a  very  clear  sense  from  the 
next  result  that  there  is  a  hierarchy  of  infinite  sets  that  continues  well  beyond 
the  continuum  of  R. 

Power  Sets  and  Cantor’s  Theorem 

Given  a  set  A,  the  power  set  P(A)  refers  to  the  collection  of  all  subsets  of  A.  It 
is  important  to  understand  that  P(A)  is  itself  considered  a  set  whose  elements 
are  the  different  possible  subsets  of  A. 

Exercise  1.6.5.  (a)  Let  A  =  {a,b,c}.  List  the  eight  elements  of  P(A).  (Do 

not  forget  that  0  is  considered  to  be  a  subset  of  every  set.) 

(b)  If  A  is  finite  with  n  elements,  show  that  P(A)  has  2n  elements. 

Exercise  1.6.6.  (a)  Using  the  particular  set  A  =  {a,  6,  c},  exhibit  two  differ¬ 

ent  1-1  mappings  from  A  into  P(A). 

(b)  Letting  C  =  {1,  2,  3, 4},  produce  an  example  of  a  1-1  map  g  :  C  —>  P(C). 

(c)  Explain  why,  in  parts  (a)  and  (b),  it  is  impossible  to  construct  mappings 
that  are  onto. 

Cantor’s  Theorem  states  that  the  phenomenon  in  Exercise  1.6.6  holds  for  in¬ 
finite  sets  as  well  as  finite  sets.  Whereas  mapping  A  into  P(A)  is  quite  effortless, 
finding  an  onto  map  is  impossible. 

Theorem  1.6.2  (Cantor’s  Theorem).  Given  any  set  A,  there  does  not  exist 
a  function  f  :  A  P(A)  that  is  onto. 

Proof.  This  proof,  like  the  others  of  its  kind,  is  indirect.  Thus,  assume,  for 
contradiction,  that  f  :  A  P(A)  is  onto.  Unlike  the  usual  situation  in  which 
we  have  sets  of  numbers  for  the  domain  and  range,  /  is  a  correspondence  between 
a  set  and  its  power  set.  For  each  element  a  E  A,  /(a)  is  a  particular  subset  of  A. 
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The  assumption  that  /  is  onto  means  that  every  subset  of  A  appears  as  /(a) 
for  some  a  E  A.  To  arrive  at  a  contradiction,  we  will  produce  a  subset  B  C  A 
that  is  not  equal  to  /(a)  for  any  a  E  A. 

Construct  B  using  the  following  rule.  For  each  element  aGi,  consider  the 
subset  /(a).  This  subset  of  A  may  contain  the  element  a  or  it  may  not.  This 
depends  on  the  function  /.  If  /(a)  does  not  contain  a,  then  we  include  a  in  our 
set  B.  More  precisely,  let 


B  =  {a  E  A  :  a  ^  /(a)}. 

Exercise  1.6.7.  Return  to  the  particular  functions  constructed  in  Exercise  1.6.6 
and  construct  the  subset  B  that  results  using  the  preceding  rule.  In  each  case, 
note  that  B  is  not  in  the  range  of  the  function  used. 

We  now  focus  on  the  general  argument.  Because  we  have  assumed  that  our 
function  f  :  A  P{A)  is  onto,  it  must  be  that  B  =  f(a')  for  some  a'  E  A.  The 
contradiction  arises  when  we  consider  whether  or  not  a'  is  an  element  of  B. 

Exercise  1.6.8.  (a)  First,  show  that  the  case  a'  E  B  leads  to  a  contradiction. 

(b)  Now,  finish  the  argument  by  showing  that  the  case  a'  ^  B  is  equally 
unacceptable.  i—i 


To  get  an  initial  sense  of  its  broad  significance,  let’s  apply  this  result  to 
the  set  of  natural  numbers.  Cantor’s  Theorem  states  that  there  is  no  onto 
function  from  N  to  P(N);  in  other  words,  the  power  set  of  the  natural  numbers 
is  uncountable.  How  does  the  cardinality  of  this  newly  discovered  uncountable 
set  compare  to  the  uncountable  set  of  real  numbers? 

Exercise  1.6.9.  Using  the  various  tools  and  techniques  developed  in  the  last 
two  sections  (including  the  exercises  from  Section  1.5),  give  a  compelling  argu¬ 
ment  showing  that  P(N)  rsj  R. 

Exercise  1.6.10.  As  a  final  exercise,  answer  each  of  the  following  by  establish¬ 
ing  a  1-1  correspondence  with  a  set  of  known  cardinality. 

(a)  Is  the  set  of  all  functions  from  {0, 1}  to  N  countable  or  uncountable? 

(b)  Is  the  set  of  all  functions  from  N  to  {0, 1}  countable  or  uncountable? 

(c)  Given  a  set  P,  a  subset  A  of  P(P)  is  called  an  antichain  if  no  element  of  A 
is  a  subset  of  any  other  element  of  A.  Does  P(N)  contain  an  uncountable 
antichain? 
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1.7  Epilogue 

The  relationship  of  having  the  same  cardinality  is  an  equivalence  relation  (see 
Exercise  1.5.5),  meaning,  roughly,  that  all  of  the  sets  in  the  mathematical  uni¬ 
verse  can  be  organized  into  disjoint  groups  according  to  their  size.  Two  sets 
appear  in  the  same  group,  or  equivalence  class ,  if  and  only  if  they  have  the  same 
cardinality.  Thus,  N,  Z,  and  Q  are  grouped  together  in  one  class  with  all  of  the 
other  countable  sets,  whereas  R  is  in  another  class  that  includes  the  intervals 
(a,  b)  as  well  as  P(N).  One  implication  of  Cantor’s  Theorem  is  that  P(R) — the 
set  of  all  subsets  of  R — is  in  a  different  class  from  R,  and  there  is  no  reason 
to  stop  here.  The  set  of  subsets  of  P(R) — namely  P(P(R)) — is  in  yet  another 
class,  and  this  process  continues  indefinitely. 

Having  divided  the  universe  of  sets  into  disjoint  groups,  it  would  be  con¬ 
venient  to  attach  a  “number”  to  each  collection  which  could  be  used  the  way 
natural  numbers  are  used  to  refer  to  the  sizes  of  finite  sets.  Given  a  set  X, 
there  exists  something  called  the  cardinal  number  of  X,  denoted  cardX,  which 
behaves  very  much  in  this  fashion.  For  instance,  two  sets  X  and  T  satisfy 
cardX  =  cardT  if  and  only  if  X  ~  Y.  (Rigorously  defining  cardX  requires 
some  significant  set  theory.  One  way  this  is  done  is  to  define  cardX  to  be  a 
very  particular  set  that  can  always  be  uniquely  found  in  the  same  equivalence 
class  as  X.) 

Looking  back  at  Cantor’s  Theorem,  we  get  the  strong  sense  that  there  is  an 
order  on  the  sizes  of  infinite  sets  that  should  be  reflected  in  our  new  cardinal 
number  system.  Specifically,  if  it  is  possible  to  map  a  set  X  into  Y  in  a  1-1 
fashion,  then  we  want  cardX  <  cardT.  Writing  the  strict  inequality  cardX  < 
card  Y  should  indicate  that  it  is  possible  to  map  X  into  Y  but  that  it  is  not  the 
case  that  X  ~  Y.  Restated  in  this  notation,  Cantor’s  Theorem  states  that  for 
every  set  A,  cardX  <  cardP(H). 

There  are  some  significant  details  to  work  out.  A  kind  of  metaphysical  prob¬ 
lem  arises  when  we  realize  that  an  implication  of  Cantor’s  Theorem  is  that  there 
can  be  no  “largest”  set.  A  declaration  such  as,  “Let  U  be  the  set  of  all  possible 
things,”  is  paradoxical  because  we  immediately  get  that  card/7  <  cardP(P) 
and  thus  the  set  U  does  not  contain  everything  it  was  advertised  to  hold.  Is¬ 
sues  such  as  this  one  are  ultimately  resolved  by  imposing  some  restrictions  on 
what  can  qualify  as  a  set.  As  set  theory  was  formalized,  the  axioms  had  to 
be  crafted  so  that  objects  such  as  U  are  simply  not  allowed.  A  more  down- 
to-earth  problem  in  need  of  attention  is  demonstrating  that  our  definition  of 
“<”  between  cardinal  numbers  really  is  an  ordering.  This  involves  showing  that 
cardinal  numbers  possess  a  property  analogous  to  real  numbers,  which  states 
that  if  cardX  <  cardT  and  cardT  <  cardX,  then  cardX  =  cardT.  In  the 
end,  this  boils  down  to  proving  that  if  there  exists  /  :  X  Y  that  is  1-1, 
and  if  there  exists  g  :  T  — )•  X  that  is  1-1,  then  it  is  possible  to  find  a  function 
h  :  X  Y  that  is  both  1-1  and  onto.  A  proof  of  this  fact  eluded  Cantor 
but  was  eventually  supplied  independently  by  Ernst  Schroder  (in  1896)  and  Fe¬ 
lix  Bernstein  (in  1898).  An  argument  for  the  Schrdder-Bernstein  Theorem  is 
outlined  in  Exercise  1.5.11. 
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There  was  another  deep  problem  stemming  from  the  budding  theory  of  car¬ 
dinal  numbers  that  occupied  Cantor  and  which  was  not  resolved  during  his 
lifetime.  Because  of  the  importance  of  countable  sets,  the  symbol  Ho  (“aleph 
naught”)  is  frequently  used  for  cardN.  The  subscript  “0”  is  appropriate  when 
we  remember  that  countable  sets  are  the  smallest  type  of  infinite  set.  In  terms 
of  cardinal  numbers,  if  cardX  <  Ho,  then  X  is  finite.  Thus,  Hq  is  the  small¬ 
est  infinite  cardinal  number.  The  cardinality  of  R  is  also  significant  enough  to 
deserve  the  special  designation  c  =  cardR  =  card(0, 1).  The  content  of  The¬ 
orems  1.5.6  and  1.6.1  is  that  Hq  <  c.  The  question  that  plagued  Cantor  was 
whether  there  were  any  cardinal  numbers  strictly  in  between  these  two.  Put 
another  way,  does  there  exist  a  set  A  C  R  with  cardN  <  card  A  <  cardR? 
Cantor  was  of  the  opinion  that  no  such  set  existed.  In  the  ordering  of  cardinal 
numbers,  he  conjectured,  c  was  the  immediate  successor  of  Hq. 

Cantor’s  “continuum  hypothesis,”  as  it  came  to  be  called,  was  one  of  the 
most  famous  mathematical  challenges  of  the  past  century.  Its  unexpected  res¬ 
olution  came  in  two  parts.  In  1940,  the  German  logician  and  mathematician 
Kurt  Godel  demonstrated  that,  using  only  the  agreed-upon  set  of  axioms  of  set 
theory,  there  was  no  way  to  disprove  the  continuum  hypothesis.  In  1963,  Paul 
Cohen  successfully  showed  that,  under  the  same  rules,  it  was  also  impossible  to 
prove  this  conjecture.  Taken  together,  what  these  two  discoveries  imply  is  that 
the  continuum  hypothesis  is  undecidable.  It  can  be  accepted  or  rejected  as  a 
statement  about  the  nature  of  infinite  sets,  and  in  neither  case  will  any  logical 
contradictions  arise. 

The  mention  of  Kurt  Godel  brings  to  mind  a  final  comment  about  the  sig¬ 
nificance  of  Cantor’s  work.  Godel  is  best  known  for  his  “Incompleteness  The¬ 
orems,”  which  pertain  to  the  strength  of  axiomatic  systems  in  general.  What 
Godel  showed  was  that  any  consistent  axiomatic  system  created  to  study  arith¬ 
metic  was  necessarily  destined  to  be  “incomplete”  in  the  sense  that  there  would 
always  be  true  statements  that  the  system  of  axioms  would  be  too  weak  to 
prove.  At  the  heart  of  Godel’s  very  complicated  proof  is  a  type  of  manipulation 
closely  related  to  what  is  happening  in  the  proofs  of  Theorems  1.6.1  and  1.6.2. 
Variations  of  Cantor’s  proof  methods  can  also  be  found  in  the  limitative  re¬ 
sults  of  computer  science.  The  “halting  problem”  asks,  loosely,  whether  some 
general  algorithm  exists  that  can  look  at  every  program  and  decide  if  that  pro¬ 
gram  eventually  terminates.  The  proof  that  no  such  algorithm  exists  uses  a 
diagonalizat ion- type  construction  at  the  core  of  the  argument.  The  main  point 
to  make  is  that  not  only  are  the  implications  of  Cantor’s  theorems  profound 
but  the  argumentative  techniques  are  as  well.  As  a  more  immediate  example  of 
this  phenomenon,  the  diagonalization  method  is  used  again  in  Chapter  6 — in  a 
constructive  way — as  a  crucial  step  in  the  proof  of  the  Arzela-Ascoli  Theorem. 


Chapter  2 
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2.1  Discussion:  Rearrangements  of  Infinite 
Series 

Consider  the  infinite  series 

(-l)n+1  1111111 

-  —  ]_  —  —  -|-  —  —  —  -f-  —  —  —  —  —  —  -)-•••  . 

n  2345678 

If  we  naively  begin  adding  from  the  left-hand  side,  we  get  a  sequence  of  what 
are  called  partial  sums.  In  other  words,  let  sn  equal  the  sum  of  the  first  n  terms 
of  the  series,  so  that  s i  =  1,  S2  =  1/2,  S3  =  5/6,  84  =  7/12,  and  so  on.  One 
immediate  observation  is  that  the  successive  sums  oscillate  in  a  progressively 
narrower  space.  The  odd  sums  decrease  (si  >  83  >  85  >  . . .)  while  the  even 
sums  increase  (82  <  84  <  86  <...). 


-+ 

0 


8^.69 


s  2  S4S6 


1  r 


S5  S3 


Si 


1 


82  <  84  <  8e  <  •  •  •  S  •  *  *  <  85  <  83  <  81 

It  seems  reasonable — and  we  will  soon  prove — that  the  sequence  (sn)  eventu¬ 
ally  hones  in  on  a  value,  call  it  5,  where  the  odd  and  even  partial  sums  “meet.” 
At  this  moment,  we  cannot  compute  S  precisely,  but  we  know  it  falls  somewhere 
between  7/12  and  5/6.  Summing  a  few  hundred  terms  reveals  that  S  ~  .69. 
Whatever  its  value,  there  is  now  an  overwhelming  temptation  to  write 


S=  1 


1  1 
—  +  — 

4  5 


1  1 

6  +  7 
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meaning,  perhaps,  that  if  we  could  indeed  add  up  all  infinitely  many  of  these 
numbers,  then  the  sum  would  equal  S.  A  more  familiar  example  of  an  equation 
of  this  type  might  be 


1111  1  1 

1+2  +  4  +  8+16  +  32  +  64  + 


5 


the  only  difference  being  that  in  the  second  equation  we  have  a  more  recognizable 
value  for  the  sum. 

But  now  for  the  crux  of  the  matter.  The  symbols  +,  — ,  and  =  in  the  preced¬ 
ing  equations  are  deceptively  familiar  notions  being  used  in  a  very  unfamiliar 
way.  The  crucial  question  is  whether  or  not  properties  of  addition  and  equality 
that  are  well  understood  for  finite  sums  remain  valid  when  applied  to  infinite  ob¬ 
jects  such  as  equation  (1).  The  answer,  as  we  are  about  to  witness,  is  somewhat 
ambiguous. 

Treating  equation  (1)  in  a  standard  algebraic  way,  let’s  multiply  through  by 
1/2  and  add  it  back  to  equation  (1): 

1  c  _  i  _i  ,  i  _i  i  JL  _  JL  i 

2  °  2  4  '  6  8  '  10  12  ^ 


+  s 


4  '  5  6  '  7  8^9  10  ^  11 


2_  +  J_ 

12  '  13 


Now,  look  carefully  at  the  result.  The  sum  in  equation  (2)  consists  precisely 
of  the  same  terms  as  those  in  the  original  equation  (1),  only  in  a  different  order. 
Specifically,  the  series  in  (2)  is  a  rearrangement  of  (1)  where  we  list  the  first 
two  positive  terms  (1  +  |)  followed  by  the  first  negative  term  (— |),  followed 
by  the  next  two  positive  terms  +  j)  and  then  the  next  negative  term  (  —  |). 
Continuing  this,  it  is  apparent  that  every  term  in  (2)  appears  in  (1)  and  vice 
versa.  The  rub  comes  when  we  realize  that  equation  (2)  asserts  that  the  sum  of 
these  rearranged,  but  otherwise  unaltered,  numbers  is  equal  to  3/2  its  original 
value.  Indeed,  adding  a  few  hundred  terms  of  equation  (2)  produces  partial 
sums  in  the  neighborhood  of  1.03.  Addition,  in  this  infinite  setting,  is  not 
commutative! 

Let’s  look  at  a  similar  rearrangement  of  the  series 


£(-V2)"- 

n— 0 


This  series  is  geometric  with  first  term  1  and  common  ratio  r  =  —1/2.  Using 
the  formula  1/(1  —  r)  for  the  sum  of  a  geometric  series  (Example  2.7.5),  we  get 

i1,11,1  1  ,  1  1,1  _  1  _  2 

-2  +  4-  8+16-32  +  64-128  +  256"’  -  1  -  (-1)  “  3' 

This  time,  some  computational  experimentation  with  the  “two  positives,  one 
negative”  rearrangement 

111111  1  1 

1  H-  —  —  —  _|_  —  _|_  —  —  —  _|_ - _|_ - —  —  . . . 

4  2  16  64  8  256  1024  32 
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yields  partial  sums  quite  close  to  2/3.  The  sum  of  the  first  30  terms,  for  instance, 
equals  .666667.  Infinite  addition  is  commutative  in  some  instances  but  not  in 
others. 

Far  from  being  a  charming  theoretical  oddity  of  infinite  series,  this  phe¬ 
nomenon  can  be  the  source  of  great  consternation  in  many  applied  situations. 
How,  for  instance,  should  a  double  summation  over  two  index  variables  be  de¬ 
fined?  Let’s  say  we  are  given  a  grid  of  real  numbers  {dij  :  i,  j  E  N},  where 
dij  =  1/2-7-2,  if  j  >  i,  dij  =  —  1  if  j  =  i,  and  =  0  if  j  <  i. 


i  l  l  l  l 

1  2  4  8  16 

0-1  I  i  I  . . . 

0  0-1  -  I  . . . 

w  w  ±  2  4 

0  0  0  -1  \  ••• 

0  0  0  0  -1  ••• 


We  would  like  to  attach  a  mathematical  meaning  to  the  summation 


oo 


i,j= 1 


whereby  we  intend  to  include  every  term  in  the  preceding  array  in  the  total. 
One  natural  idea  is  to  temporarily  fix  i  and  sum  across  each  row.  A  moment’s 
reflection  (and  a  fact  about  geometric  series)  shows  that  each  row  sums  to  0. 
Summing  the  sums  of  the  rows,  we  get 


oo 


1 


E  (°)  =  °- 

i— 1 


We  could  just  as  easily  have  decided  to  fix  j  and  sum  down  each  column  first. 
In  this  case,  we  have 


oo 


ij= 1 


Changing  the  order  of  the  summation  changes  the  value  of  the  sum!  One  com¬ 
mon  way  that  double  sums  arise  (although  not  this  particular  one)  is  from  the 
multiplication  of  two  series.  There  is  a  natural  desire  to  write 


bj)  —  djbj, 


ho 


except  that  the  expression  on  the  right-hand  side  makes  no  sense  at  the  moment. 
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It  is  the  pathologies  that  give  rise  to  the  need  for  rigor.  A  satisfying  resolu¬ 
tion  to  the  questions  raised  will  require  that  we  be  absolutely  precise  about  what 
we  mean  as  we  manipulate  these  infinite  objects.  It  may  seem  that  progress  is 
slow  at  first,  but  that  is  because  we  do  not  want  to  fall  into  the  trap  of  letting 
the  biases  of  our  intuition  corrupt  our  arguments.  Rigorous  proofs  are  meant 
to  be  a  check  on  intuition,  and  in  the  end  we  will  see  that  they  vastly  improve 
our  mental  picture  of  the  mathematical  infinite. 

As  a  final  example,  consider  something  as  intuitively  fundamental  as  the 
associative  property  of  addition  applied  to  the  series  l)n-  Grouping 

the  terms  one  way  gives 

(-1  +  1)  +  (-1  +  1)  +  (-1  +  1)  +  (-1  +  1)  +  •  •  •  =  0  +  0  +  0  +  0  +  •  •  •  =  0, 

whereas  grouping  in  another  yields 

-1  +  (1  -  1)  +  (1  -  1)  +  (1  -  1)  +  •  •  •  =  -1  +  0  +  0  +  0  +  •  •  •  =  -1. 

Manipulations  that  are  legitimate  in  finite  settings  do  not  always  extend  to 
infinite  settings.  Deciding  when  they  do  and  why  they  do  not  is  one  of  the 
central  themes  of  analysis. 
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An  understanding  of  infinite  series  depends  heavily  on  a  clear  understanding  of 
the  theory  of  sequences.  In  fact,  most  of  the  concepts  in  analysis  can  be  reduced 
to  statements  about  the  behavior  of  sequences.  Thus,  we  will  spend  a  significant 
amount  of  time  investigating  sequences  before  taking  on  infinite  series. 

Definition  2.2.1.  A  sequence  is  a  function  whose  domain  is  N. 

This  formal  definition  leads  immediately  to  the  familiar  depiction  of  a  se¬ 
quence  as  an  ordered  list  of  real  numbers.  Given  a  function  /  :  N  — >  R,  f(n)  is 
just  the  nth  term  on  the  list.  The  notation  for  sequences  reinforces  this  familiar 
understanding. 


Example  2.2.2.  Each  of  the  following  are  common  ways  to  describe  a  sequence. 

(i) 

(ii)  i  =  (f,  !,•••), 


(iii)  (an),  where  an  =  2n  for  each  n  E  N, 

(iv)  (xn),  where  x\  —  2  and  xn+i  =  Xn+l 


On  occasion,  it  will  be  more  convenient  to  index  a  sequence  beginning  with 
n  =  0  or  n  =  no  for  some  natural  number  no  different  from  1.  These  minor 
variations  should  cause  no  confusion.  What  is  essential  is  that  a  sequence  be  an 
infinite  list  of  real  numbers.  What  happens  at  the  beginning  of  such  a  list  is  of 
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little  importance  in  most  cases.  The  business  of  analysis  is  concerned  with  the 
behavior  of  the  infinite  “tail”  of  a  given  sequence. 

We  now  present  what  is  arguably  the  most  important  definition  in  the  book. 


Definition  2.2.3  (Convergence  of  a  Sequence).  A  sequence  (an)  converges 
to  a  real  number  a  if,  for  every  positive  number  e,  there  exists  an  N  G  N  such 
that  whenever  n  >  N  it  follows  that  I  an  —  a  <  e. 


To  indicate  that  (an)  converges  to  a,  we  usually  write  either  liman  —  a  or 
(an)  —>  a.  The  notation  lmq^oo  an  =  a  is  also  standard. 

In  an  effort  to  decipher  this  complicated  definition,  it  helps  first  to  consider 
the  ending  phrase  “\an  —  a\  <  e,”  and  think  about  the  points  that  satisfy  an 
inequality  of  this  type. 


Definition  2.2.4.  Given  a  real  number  a  E  R  and  a  positive  number  e  >  0, 
the  set 


V€(a)  =  {x  G  R  : 
is  called  the  e-neighborhood  of  a. 


x  —  a 


<  4 


Notice  that  Ve(a)  consists  of  all  of  those  points  whose  distance  from  a  is  less 
than  e.  Said  another  way,  Ve(a)  is  an  interval,  centered  at  a,  with  radius  e. 


Ve(a) 

/ - * - S 

-4 - ♦ - )- 

CL  —  6  CL  CL  C 


Recasting  the  definition  of  convergence  in  terms  of  e-neighborhoods  gives  a 
more  geometric  impression  of  what  is  being  described. 

Definition  2.2.3B  (Convergence  of  a  Sequence:  Topological  Version). 

A  sequence  (an)  converges  to  a  if,  given  any  e-neighborhood  Ve(a)  of  a,  there 
exists  a  point  in  the  sequence  after  which  all  of  the  terms  are  in  Ve(a).  In  other 
words,  every  e-neighborhood  contains  all  but  a  finite  number  of  the  terms  of 
ifln)' 


-+ 


a\ 


14(a) 


CL  —  6  CL  CL~\~  C 


► 


Definition  2.2.3  and  Definition  2.2.3B  say  precisely  the  same  thing;  the  nat¬ 
ural  number  N  in  the  original  version  of  the  definition  is  the  point  where  the 
sequence  (an)  enters  Ve(a),  never  to  leave.  It  should  be  apparent  that  the  value 
of  N  depends  on  the  choice  of  e.  The  smaller  the  e-neighborhood,  the  larger  N 
may  have  to  be. 
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Example  2.2.5.  Consider  the  sequence  (an),  where  an  =  1  j yfn. 

Our  intuitive  understanding  of  limits  points  confidently  to  the  conclusion 
that 


Before  trying  to  prove  this  not  too  impressive  fact,  let’s  first  explore  the  rela¬ 
tionship  between  e  and  N  in  the  definition  of  convergence.  For  the  moment,  take 
e  to  be  1/10.  This  defines  a  sort  of  “target  zone”  for  the  terms  in  the  sequence. 
By  claiming  that  the  limit  of  (an)  is  0,  we  are  saying  that  the  terms  in  this 
sequence  eventually  get  arbitrarily  close  to  0.  How  close?  What  do  we  mean 
by  “eventually”?  We  have  set  e  =  1/10  as  our  standard  for  closeness,  which 
leads  to  the  e-neighborhood  (—1/10,1/10)  centered  around  the  limit  0.  How 
far  out  into  the  sequence  must  we  look  before  the  terms  fall  into  this  interval? 
The  100th  term  aioo  =  1/10  puts  us  right  on  the  boundary,  and  a  little  thought 
reveals  that 

/  1  i 

if  n  >  100,  then  an  £ - ,  — 

’  V  io  io 


Thus,  for  e  =  1/10  we  choose  N  =  101  (or  anything  larger)  as  our  response. 

Now,  our  choice  of  e  =  1/10  was  rather  whimsical,  and  we  can  do  this  again, 
letting  e  =  1/50.  In  this  case,  our  target  neighborhood  shrinks  to  (  —  1/50, 1/50), 
and  it  is  apparent  that  we  must  travel  farther  out  into  the  sequence  before  an 
falls  into  this  interval.  How  far?  Essentially,  we  require  that 


1  1 

^Jn  50 


which  occurs  as  long  as  n  >  502  =  2500. 


Thus,  N  =  2501  is  a  suitable  response  to  the  challenge  of  e  =  1/50. 

It  may  seem  as  though  this  duel  could  continue  forever,  with  different  e 
challenges  being  handed  to  us  one  after  another,  each  one  requiring  a  suitable 
value  of  N  in  response.  In  a  sense,  this  is  correct,  except  that  the  game  is 
effectively  over  the  instant  we  recognize  a  rule  for  how  to  choose  N  given  an 
arbitrary  e  >  0.  For  this  problem,  the  desired  algorithm  is  implicit  in  the  algebra 
carried  out  to  compute  the  previous  response  of  N  =  2501.  Whatever  e  happens 
to  be,  we  want 

1  1 

—=  <  e  which  is  equivalent  to  insisting  that  n  >  — . 

\  n  ez 


With  this  observation,  we  are  ready  to  write  the  formal  argument. 
We  claim  that 


Proof.  Let  e  >  0  be  an  arbitrary  positive  number.  Choose  a  natural  number  N 
satisfying 


1 


2.2.  The  Limit  of  a  Sequence 


45 


We  now  verify  that  this  choice  of  N  has  the  desired  property.  Let  n  >  N.  Then, 


n  >  \  implies  <  e,  and  hence 


a 


n 


0|  <  e. 


□ 


Quantifiers 

The  definition  of  convergence  given  earlier  is  the  result  of  hundreds  of  years  of 
refining  the  intuitive  notion  of  limit  into  a  mathematically  rigorous  statement. 
The  logic  involved  is  complicated  and  is  intimately  tied  to  the  use  of  the  quan¬ 
tifiers  “for  all”  and  “there  exists.”  Learning  to  write  a  grammatically  correct 
convergence  proof  goes  hand  in  hand  with  a  deep  understanding  of  why  the 
quantifiers  appear  in  the  order  that  they  do. 

The  definition  begins  with  the  phrase, 

“ For  all  e  >  0,  there  exists  N  £  N  such  that  ...” 


Looking  back  at  our  first  example,  we  see  that  our  formal  proof  begins  with,  “Let 
e  >  0  be  an  arbitrary  positive  number.”  This  is  followed  by  a  construction  of  N 
and  then  a  demonstration  that  this  choice  of  N  has  the  desired  property.  This, 
in  fact,  is  a  basic  outline  for  how  every  convergence  proof  should  be  presented. 

Template  for  a  proof  that  (xn)  -a  x  : 

-  “Let  e  >  0  be  arbitrary.” 

-  Demonstrate  a  choice  for  N  E  N.  This  step  usually  requires  the  most 
work,  almost  all  of  which  is  done  prior  to  actually  writing  the  formal 
proof. 

-  Now,  show  that  N  actually  works. 

-  “Assume  n  >  NT 


-  With  N  well  chosen,  it  should  be  possible  to  derive  the  inequality 


rp  _  rp 


<  e. 


Example  2.2.6.  Show 


lim 


n  +  1 


n 


As  mentioned,  before  attempting  a  formal  proof,  we  first  need  to  do  some 
preliminary  scratch  work.  In  the  first  example,  we  experimented  by  assigning 
specific  values  to  e  (and  it  is  not  a  bad  idea  to  do  this  again),  but  let  us  skip 
straight  to  the  algebraic  punch  line.  The  last  line  of  our  proof  should  be  that 
for  suitably  large  values  of  n, 


n  +  1 


n 


1 


<  e. 
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Because 

n  +  1   
n  n 

this  is  equivalent  to  the  inequality  l/n<eorn>  1/e.  Thus,  choosing  N  to  be 
an  integer  greater  than  1/e  will  suffice. 

With  the  work  of  the  proof  done,  all  that  remains  is  the  formal  writeup. 

Proof.  Let  e  >  0  be  arbitrary.  Choose  N  E  N  with  N  >  1/e.  To  verify  that 
this  choice  of  N  is  appropriate,  let  n  G  N  satisfy  n  >  N.  Then,  n  >  N  implies 
n  >  1/e,  which  is  the  same  as  saying  1/n  <  e.  Finally,  this  means 


n  +  1 
n 


1 


<  e 


5 


as  desired.  □ 

It  is  instructive  to  see  what  goes  wrong  in  the  previous  example  if  we  try  to 
prove  that  our  sequence  converges  to  some  limit  other  than  1. 

Theorem  2.2.7  (Uniqueness  of  Limits).  The  limit  of  a  sequence,  when  it 
exists,  must  be  unique. 

Proof.  Exercise  2.2.6.  □ 

Divergence 

Significant  insight  into  the  role  of  the  quantifiers  in  the  definition  of  convergence 
can  be  gained  by  studying  an  example  of  a  sequence  that  does  not  have  a  limit. 

Example  2.2.8.  Consider  the  sequence 

11  11  11  11  11  11  1 

?  2  >  3  ?  4^5^  5^5^  5^5^  5^5^  5^5^  5^ 

How  can  we  argue  that  this  sequence  does  not  converge  to  zero?  Looking  at  the 
first  few  terms,  it  seems  the  initial  evidence  actually  supports  such  a  conclusion. 
Given  a  challenge  of  e  =  1/2,  a  little  reflection  reveals  that  after  N  =  3  all  the 
terms  fall  into  the  neighborhood  (—1/2, 1/2).  We  could  also  handle  e  =  1/4. 
(What  is  the  smallest  possible  N  in  this  case?) 

But  the  definition  of  convergence  says  “ For  all  e  >  0. . . ,”  and  it  should  be 
apparent  that  there  is  no  response  to  a  choice  of  e  =  1/10,  for  instance.  This 
leads  us  to  an  important  observation  about  the  logical  negation  of  the  definition 
of  convergence  of  a  sequence.  To  prove  that  a  particular  number  x  is  not  the 
limit  of  a  sequence  (xn),  we  must  produce  a  single  value  of  e  for  which  no  TV  e  N 
works.  More  generally  speaking,  the  negation  of  a  statement  that  begins  “For  all 
P,  there  exists  Q. . .  ”  is  the  statement,  “For  at  least  one  P,  no  Q  is  possible. . .  ” 
For  instance,  how  could  we  disprove  the  spurious  claim  that  “At  every  college 
in  the  United  States,  there  is  a  student  who  is  at  least  seven  feet  tall”? 
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We  have  argued  that  the  preceding  sequence  does  not  converge  to  0.  Let’s 
argue  against  the  claim  that  it  converges  to  1/5.  Choosing  e  =  1/10  produces 
the  neighborhood  (1/10,3/10).  Although  the  sequence  continually  revisits  this 
neighborhood,  there  is  no  point  at  which  it  enters  and  never  leaves  as  the  defini¬ 
tion  requires.  Thus,  no  N  exists  for  e  =  1/10,  so  the  sequence  does  not  converge 
to  1/5. 

Of  course,  this  sequence  does  not  converge  to  any  other  real  number,  and  it 
would  be  more  satisfying  to  simply  say  that  this  sequence  does  not  converge. 

Definition  2.2.9.  A  sequence  that  does  not  converge  is  said  to  diverge. 

Although  it  is  not  too  difficult,  we  will  postpone  arguing  for  divergence  in  general 
until  we  develop  a  more  economical  divergence  criterion  later  in  Section  2.5. 


Exercises 


Exercise  2.2.1.  What  happens  if  we  reverse  the  order  of  the  quantifiers  in 
Definition  2.2.3? 

Definition:  A  sequence  (xn)  verconges  to  x  if  there  exists  an  e  >  0  such  that 
for  all  N  e  N  it  is  true  that  n  >  N  implies  \xn  —  x\  <  e. 

Give  an  example  of  a  vercongent  sequence.  Is  there  an  example  of  a  ver- 
congent  sequence  that  is  divergent?  Can  a  sequence  ver conge  to  two  different 
values?  What  exactly  is  being  described  in  this  strange  definition? 


Exercise  2.2.2.  Verify,  using  the  definition  of  convergence  of  a  sequence,  that 
the  following  sequences  converge  to  the  proposed  limit. 

(a)  limfsii  =  §. 

(b)  lim  =  0. 

(c)  lim  =  0. 

Exercise  2.2.3.  Describe  what  we  would  have  to  demonstrate  in  order  to  dis¬ 
prove  each  of  the  following  statements. 

(a)  At  every  college  in  the  United  States,  there  is  a  student  who  is  at  least 
seven  feet  tall. 

(b)  For  all  colleges  in  the  United  States,  there  exists  a  professor  who  gives 
every  student  a  grade  of  either  A  or  B. 

(c)  There  exists  a  college  in  the  United  States  where  every  student  is  at  least 
six  feet  tall. 


Exercise  2.2.4.  Give  an  example  of  each  or  state  that  the  request  is  impossible. 
For  any  that  are  impossible,  give  a  compelling  argument  for  why  that  is  the  case. 

(a)  A  sequence  with  an  infinite  number  of  ones  that  does  not  converge  to  one. 
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(b)  A  sequence  with  an  infinite  number  of  ones  that  converges  to  a  limit  not 
equal  to  one. 

(c)  A  divergent  sequence  such  that  for  every  n  E  N  it  is  possible  to  find  n 
consecutive  ones  somewhere  in  the  sequence. 


Exercise  2.2.5.  Let  [[#]]  be  the  greatest  integer  less  than  or  equal  to  x.  For 
example,  [[tt}\  =  3  and  [[3]]  =  3.  For  each  sequence,  find  liman  and  verify  it 
with  the  definition  of  convergence. 


(a)  an  =  [[5 /n}}, 

(b)  an  =  [[(12  +  4n)/3n 


Reflecting  on  these  examples,  comment  on  the  statement  following 
Definition  2.2.3  that  “the  smaller  the  e-neighborhood,  the  larger  N  may  have 
to  be.” 


Exercise  2.2.6.  Prove  Theorem  2.2.7.  To  get  started,  assume  (an)  a  and 
also  that  (an)  —>  b.  Now  argue  a  =  b. 

Exercise  2.2.7.  Here  are  two  useful  definitions: 


(i)  A  sequence  (an)  is  eventually  in  a  set  A  C  R  if  there  exists  an  N  E  N 
such  that  an  E  A  for  all  n  >  N . 

(ii)  A  sequence  (an)  is  frequently  in  a  set  A  C  R  if,  for  every  N  E  N,  there 
exists  an  n  >  N  such  that  an  E  A. 

(a)  Is  the  sequence  (  —  l)n  eventually  or  frequently  in  the  set  {i}? 

(b)  Which  definition  is  stronger?  Does  frequently  imply  eventually  or 
does  eventually  imply  frequently? 

(c)  Give  an  alternate  rephrasing  of  Definition  2.2.3B  using  either  fre¬ 
quently  or  eventually.  Which  is  the  term  we  want? 

(d)  Suppose  an  infinite  number  of  terms  of  a  sequence  (xn)  are  equal 
to  2.  Is  (xn)  necessarily  eventually  in  the  interval  (1.9,  2.1)?  Is  it 
frequently  in  (1.9,  2.1)? 

Exercise  2.2.8.  For  some  additional  practice  with  nested  quantifiers,  consider 
the  following  invented  definition: 

Let’s  call  a  sequence  (xn)  zero-heavy  if  there  exists  Me  N  such  that  for  all 
N  E  N  there  exists  n  satisfying  N  <  n  <  N  +  M  where  xn  =  0. 

(a)  Is  the  sequence  (0, 1,  0, 1,  0, 1, . . .)  zero  heavy? 

(b)  If  a  sequence  is  zero-heavy  does  it  necessarily  contain  an  infinite  number 
of  zeros?  If  not,  provide  a  counterexample. 

(c)  If  a  sequence  contains  an  infinite  number  of  zeros,  is  it  necessarily  zero- 
heavy?  If  not,  provide  a  counterexample. 

(d)  Form  the  logical  negation  of  the  above  definition.  That  is,  complete  the 
sentence:  A  sequence  is  not  zero-heavy  if  ... . 
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2.3  The  Algebraic  and  Order  Limit 
Theorems 


The  real  purpose  of  creating  a  rigorous  definition  for  convergence  of  a  sequence  is 
not  to  have  a  tool  to  verify  computational  statements  such  as  lim  2n/(n-f-2)  =  2. 
Historically,  a  definition  of  the  limit  like  Definition  2.2.3  came  150  years  after  the 
founders  of  calculus  began  working  with  intuitive  notions  of  convergence.  The 
point  of  having  such  a  logically  tight  description  of  convergence  is  so  that  we 
can  confidently  prove  statements  about  convergent  sequences  in  general.  We  are 
ultimately  trying  to  resolve  arguments  about  what  is  and  is  not  true  regarding 
the  behavior  of  limits  with  respect  to  the  mathematical  manipulations  we  intend 
to  inflict  on  them. 

As  a  first  example,  let  us  prove  that  convergent  sequences  are  bounded.  The 
term  “bounded”  has  a  rather  familiar  connotation  but,  like  everything  else,  we 
need  to  be  explicit  about  what  it  means  in  this  context. 


Definition  2.3.1.  A  sequence  (xn)  is  bounded  if  there  exists  a  number  M  >  0 


such  that 


x 


n 


<  M  for  all  n  £  N. 


Geometrically,  this  means  that  we  can  find  an  interval 
every  term  in  the  sequence  (xn). 


— M,  M]  that  contains 


Theorem  2.3.2.  Every  convergent  sequence  is  bounded. 

Proof.  Assume  (xn)  converges  to  a  limit  l.  This  means  that  given  a  particular 
value  of  e,  say  e  =  1,  we  know  there  must  exist  an  TV  £  N  such  that  if  n  >  TV, 
then  xn  is  in  the  interval  (/  —  1,/  +  1).  Not  knowing  whether  l  is  positive  or 
negative,  we  can  certainly  conclude  that 


x 


n 


<i+i 


for  all  n  >  N. 


Xn ,  n>N 

X4 

-• - 

t 

M 

We  still  need  to  worry  (slightly)  about  the  terms  in  the  sequence  that  come 
before  the  TVth  term.  Because  there  are  only  a  finite  number  of  these,  we  let 


x2 


X\ 


x3 


x5 


0 


/-I  l  l+l 


M  =  max{|.x‘i 


X]\f—  1  ,  l  +  1}. 


It  follows  that 


xn 


<  M  for  all  n  £  N,  as  desired. 


□ 


This  chapter  began  with  a  demonstration  of  how  applying  familiar  algebraic 
properties  (commutativity  of  addition)  to  infinite  objects  (series)  can  lead  to 
paradoxical  results.  These  examples  are  meant  to  instill  in  us  a  sense  of  caution 
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and  justify  the  extreme  care  we  are  taking  in  drawing  our  conclusions.  The 
following  theorems  illustrate  that  sequences  behave  extremely  well  with  respect 
to  the  operations  of  addition,  multiplication,  division,  and  order. 

Theorem  2.3.3  (Algebraic  Limit  Theorem).  Let  liman  =  a,  and  lim6n  = 

b.  Then, 

(i)  lim (can)  =  ca,  for  all  c  E  R; 

(ii)  lim(an  +  bn)  =  a  +  b; 

(iii)  lim(an6n)  =  ab; 

(iv)  lim(an/6n)  =  a /b,  provided  b  ^  0. 

Proof,  (i)  Consider  the  case  where  c  /  0.  We  want  to  show  that  the  sequence 
(< can )  converges  to  ca,  so  the  structure  of  the  proof  follows  the  template  we 
described  in  Section  2.2.  First,  we  let  e  be  some  arbitrary  positive  number.  Our 
goal  is  to  find  some  point  in  the  sequence  (can)  after  which  we  have 


can  —  ca 


<  e. 


Now, 


ca 


n 


ca 


c 


a 


n 


a 


We  are  given  that  (an)  a,  so  we  know  we  can  make 
like.  In  particular,  we  can  choose  an  N  such  that 


a 


n 


a 


as  small  as  we 


a 


n 


a 


< 


c 


whenever  n  >  N.  To  see  that  this  N  indeed  works,  observe  that,  for  all  n  >  TV, 


can  —  ca 


c 


an  —  a 


< 


c 


c 


=  e. 


The  case  c  =  0  reduces  to  showing  that  the  constant  sequence  (0,  0,  0, . . .)  con¬ 
verges  to  0,  which  is  easily  verified. 

Before  continuing  with  parts  (ii),  (iii),  and  (iv),  we  should  point  out  that 
the  proof  of  (i),  while  somewhat  short,  is  extremely  typical  for  a  convergence 
proof.  Before  embarking  on  a  formal  argument,  it  is  a  good  idea  to  take  an 
inventory  of  what  we  want  to  make  less  than  e,  and  what  we  are  given  can  be 
made  small  for  suitable  choices  of  n.  For  the  previous  proof,  we  wanted  to  make 


ca 


n 


ca 


<  T  and  we  were  given 


a 


n 


a 


<  anything  we  like  (for  large  values 


of  n).  Notice  that  in  (i),  and  all  of  the  ensuing  arguments,  the  strategy  each 
time  is  to  bound  the  quantity  we  want  to  be  less  than  e,  which  in  each  case  is 


(terms  of  sequence)  —  (proposed  limit)  |, 


with  some  algebraic  combination  of  quantities  over  which  we  have  control. 
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(ii)  To  prove  this  statement,  we  need  to  argue  that  the  quantity 

|  ((in  +  bn)  —  (a  +  b)| 


can  be  made  less  than  an  arbitrary  e  using  the  assumptions  that  |  an  —  a  |  and 
|  bn  —  b |  can  be  made  as  small  as  we  like  for  large  n.  The  first  step  is  to  use  the 
triangle  inequality  (Example  1.2.5)  to  say 


(Cin  H“  bn)  —  (CL  +  6)  |  —  \{dn  ~  (l)  {bn  —  b)  |  < 


an  —  a 


+  bn  —  b 


Again,  we  let  e  >  0  be  arbitrary.  The  technique  this  time  is  to  divide  the  e 
between  the  two  expressions  on  the  right-hand  side  in  the  preceding  inequality. 
Using  the  hypothesis  that  (an)  a,  we  know  there  exists  an  Ni  such  that 


a 


n 


a 


e 

K  2 


whenever  n>  N\. 


Likewise,  the  assumption  that  (bn) 


that 


b  means  that  we  can  choose  an  N2  so 
whenever  n  >  N2. 


The  question  now  arises  as  to  which  of  Ni  or  N2  we  should  take  to  be  our 
choice  of  N.  By  choosing  N  =  max{7Vi,  we  ensure  that  if  n  >  TV,  then 
n  >  Ni  and  n  >  N2.  This  allows  us  to  conclude  that 


(an  +  bn)  —  (a  +  6)|  < 

< 


—  a 
e 


+  b 


’n 


b 


for  all  n  >  IV,  as  desired. 


(iii)  To  show  that  ( anbn )  — >•  a5,  we  begin  by  observing  that 


anbn 


ab 


=  anbn  —  abn  +  abn  —  ab 
<  anbn  —  abn  +  abn  —  ab 


'n 


a 


n 


a 


+ 


a 


’n 


In  the  initial  step,  we  subtracted  and  then  added  a5n,  which  created  an  oppor¬ 
tunity  to  use  the  triangle  inequality.  Essentially,  we  have  broken  up  the  distance 
from  anbn  to  ab  with  a  midway  point  and  are  using  the  sum  of  the  two  distances 
to  overestimate  the  original  distance.  This  clever  trick  will  become  a  familiar 
technique  in  arguments  to  come. 

Letting  e  >  0  be  arbitrary,  we  again  proceed  with  the  strategy  of  making  each 
piece  in  the  preceding  inequality  less  than  e/2.  For  the  piece  on  the  right-hand 
side  (|a||6n  —  b |),  if  a  7^  0  we  can  choose  Ni  so  that 


n  >  Ni  implies  \bn  —  b  < 


1  e 


a 
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(The  case  when  a  =  0  is  handled  in  Exercise  2.3.9.)  Getting  the  term  on  the 
left-hand  side  (|6n||an  —  a|)  to  be  less  than  e/2  is  complicated  by  the  fact  that 
we  have  a  variable  quantity  \bn\  to  contend  with  as  opposed  to  the  constant 
a |  we  encountered  in  the  right-hand  term.  The  idea  is  to  replace  \bn\  with 
a  worst-case  estimate.  Using  the  fact  that  convergent  sequences  are  bounded 
(Theorem  2.3.2),  we  know  there  exists  a  bound  M  >  0  satisfying  \bn\  <  M  for 
all  n  G  N.  Now,  we  can  choose  N2  so  that 


a 


n 


a 


1  e  Ar 

< - whenever  n  >  No. 

M2  ~ 


To  finish  the  argument,  pick  N  =  max  {Ah,  A^},  and  observe  that  if  n  >  N, 
then 


anbn  —  ab 


< 

an 

bn  ^ 

— 

bn 

an 

< 

M 

an  (■ 

(  6  " 

< 

M 

VM2, 

abn  -1-  ab 


'n 


a 


+ 
+ 


a 


a 


'n 

bn-b 


ab 

b 


+ 


a 


a 


(iv)  This  final  statement  will  follow  from  (iii)  if  we  can  prove  that 

1 


(bn)  b  implies 


whenever  b  7^  0.  We  begin  by  observing  that 


'n 


1 

b 


1  1 

b-bn 

bn  b 

b 

bn 

Because  (bn)  6,  we  can  make  the  preceding  numerator  as  small  as  we  like  by 
choosing  n  large.  The  problem  comes  in  that  we  need  a  worst-case  estimate  on 
the  size  of  l/(|6||6n|).  Because  the  bn  terms  are  in  the  denominator,  we  are  no 
longer  interested  in  an  upper  bound  on  \bn\  but  rather  in  an  inequality  of  the 
form  \bn\  >  <5  >  0.  This  will  then  lead  to  a  bound  on  the  size  of  l/(|6||6n|). 

The  trick  is  to  look  far  enough  out  into  the  sequence  (bn)  so  that  the  terms 
are  closer  to  b  than  they  are  to  0.  Consider  the  particular  value  eo  =  \b\/2. 
Because  (bn)  6,  there  exists  an  N±  such  that  \bn  —  b\  <  \b\/2  for  all  n>  N\. 
This  implies  \bn\  >  \b\/2. 

Next,  choose  N2  so  that  n  >  N2  implies 


bn  —  b< 


Finally,  if  we  let  N  =  max{Ad,  W},  then  n  >  N  implies 

1  1  _  1  ^  e|6|2  1 

K~b~  ~  ”MU<  2  |6|J|L 


□ 
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Limits  and  Order 

Although  there  are  a  few  dangers  to  avoid  (see  Exercise  2.3.7),  the  Algebraic 
Limit  Theorem  verifies  that  the  relationship  between  algebraic  combinations  of 
sequences  and  the  limiting  process  is  as  trouble-free  as  we  could  hope  for.  Limits 
can  be  computed  from  the  individual  component  sequences  provided  that  each 
component  limit  exists.  The  limiting  process  is  also  well-behaved  with  respect 
to  the  order  operation. 


Theorem  2.3.4  (Order  Limit  Theorem).  Assume  liman  =  a  andY\mbn  =  b. 

(i)  If  an  >0  for  all  n  E  N,  then  a  >  0. 

(ii)  If  an  <  bn  for  all  n  E  N,  then  a  <b. 

(iii)  If  there  exists  c  E  R  for  which  c  <bn  for  all  n  E  N,  then  c  <  b.  Similarly, 
if  an  <  c  for  all  n  E  N,  then  a  <  c. 


Proof,  (i)  We  will  prove  this  by  contradiction;  thus,  let’s  assume  a  <  0.  The 
idea  is  to  produce  a  term  in  the  sequence  (an)  that  is  also  less  than  zero.  To 
do  this,  we  consider  the  particular  value  e  =  \a\.  The  definition  of  convergence 
guarantees  that  we  can  find  an  N  such  that 


an 

a 


a 


<  \a\  for  all  n  >  N.  In 
particular,  this  would  mean  that  ajy  —  a \  <  |a|,  which  implies  a tv  <  0.  This 
contradicts  our  hypothesis  that  a tv  >  0.  We  therefore  conclude  that  a  >  0. 


CL  —  CQ 


a 


&N 


0 — a+eo 


(ii)  The  Algebraic  Limit  Theorem  ensures  that  the  sequence  (bn  —  an)  con¬ 
verges  to  b  —  a.  Because  bn  —  an  >  0,  we  can  apply  part  (i)  to  get  that  b  —  a  >  0. 

(iii)  Take  an  =  c  (or  bn  =  c )  for  all  n  E  N,  and  apply  (ii).  □ 

A  word  about  the  idea  of  “tails”  is  in  order.  Loosely  speaking,  limits  and 
their  properties  do  not  depend  at  all  on  what  happens  at  the  beginning  of 
the  sequence  but  are  strictly  determined  by  what  happens  when  n  gets  large. 
Changing  the  value  of  the  first  ten — or  ten  thousand — terms  in  a  particular 
sequence  has  no  effect  on  the  limit.  Theorem  2.3.4,  part  (i),  for  instance,  assumes 
that  an  >  0  for  all  n  E  N.  However,  the  hypothesis  could  be  weakened  by 
assuming  only  that  there  exists  some  point  Ah  where  an  >  0  for  all  n  >  Ah- 
The  theorem  remains  true,  and  in  fact  the  same  proof  is  valid  with  the  provision 
that  when  N  is  chosen  it  be  at  least  as  large  as  Ah. 

In  the  language  of  analysis,  when  a  property  (such  as  non-negativity)  is  not 
necessarily  possessed  by  some  finite  number  of  initial  terms  but  is  possessed 
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by  all  terms  in  the  sequence  after  some  point  TV,  we  say  that  the  sequence 
eventually  has  this  property.  (See  Exercise  2.2.7.)  Theorem  2.3.4,  part  (i),  could 
be  restated,  “Convergent  sequences  that  are  eventually  nonnegative  converge  to 
nonnegative  limits.”  Parts  (ii)  and  (iii)  have  similar  modifications,  as  will  many 
other  upcoming  results. 

Exercises 

Exercise  2.3.1.  Let  xn  >  0  for  all  n  e  N. 

(a)  If  (xn)  —>  0,  show  that  (y/x^)  — >  0. 

(b)  If  (xn)  —>  x ,  show  that  (ydrU)  —>  yjx. 

Exercise  2.3.2.  Using  only  Definition  2.2.3,  prove  that  if  (xn)  2,  then 

(a)  (^i)  ->  1; 

(b)  (1/ xn)  —>  1/2. 

(For  this  exercise  the  Algebraic  Limit  Theorem  is  off-limits,  so  to  speak.) 

Exercise  2.3.3  (Squeeze  Theorem).  Show  that  if  xn  <  yn  <  zn  for  all 
n  G  N,  and  if  lim  xn  =  lim  zn  =  /,  then  lim  yn  =  l  as  well. 

Exercise  2.3.4.  Let  (an)  0,  and  use  the  Algebraic  Limit  Theorem  to  com¬ 
pute  each  of  the  following  limits  (assuming  the  fractions  are  always  defined): 

(a)  lim  ( 1+£„2°"4ai,  ) 

(b)  lim((an+2j2~4) 

(c)  lim(-2£±q. 

Exercise  2.3.5.  Let  (xn)  and  (yn)  be  given,  and  define  (zn)  to  be  the  “shuffled” 
sequence  (aq,  yi,  X2,  y2,  2/3 ,  •  •  • ,  xn,  yn, . . .).  Prove  that  (zn)  is  convergent  if 

and  only  if  (xn)  and  (yn)  are  both  convergent  with  limxn  =  lim yn. 

Exercise  2.3.6.  Consider  the  sequence  given  by  bn  =  n  —  \/ n2  +  2 n.  Taking 
(1/n)  -q  0  as  given,  and  using  both  the  Algebraic  Limit  Theorem  and  the  result 
in  Exercise  2.3.1,  show  lim6n  exists  and  find  the  value  of  the  limit. 

Exercise  2.3.7.  Give  an  example  of  each  of  the  following,  or  state  that  such  a 
request  is  impossible  by  referencing  the  proper  theorem(s): 

(a)  sequences  (xn)  and  (yn),  which  both  diverge,  but  whose  sum  (xn  +  yn) 
converges; 

(b)  sequences  (xn)  and  (yn),  where  (xn)  converges,  (yn)  diverges,  and  (xn+yn) 
converges; 
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(c)  a  convergent  sequence  (bn)  with  bn  ^  0  for  all  n  such  that  (1  /bn)  diverges; 

(d)  an  unbounded  sequence  (an)  and  a  convergent  sequence  (bn)  with  ( an  —  bn ) 
bounded; 

(e)  two  sequences  (an)  and  (bn),  where  ( anbn )  and  (an)  converge  but  (bn) 
does  not. 


Exercise  2.3.8.  Let  (xn)  x  and  let  p(x)  be  a  polynomial. 

(a)  Show  p(xn)  —>  p(x). 

(b)  Find  an  example  of  a  function  f{x)  and  a  convergent  sequence  (xn)  x 
where  the  sequence  f(xn)  converges,  but  not  to  f(x). 

Exercise  2.3.9.  (a)  Let  (an)  be  a  bounded  (not  necessarily  convergent) 

sequence,  and  assume  lim6n  =  0.  Show  that  lim(an6n)  =  0.  Why  are 
we  not  allowed  to  use  the  Algebraic  Limit  Theorem  to  prove  this? 

(b)  Can  we  conclude  anything  about  the  convergence  of  ( anbn )  if  we  assume 
that  (bn)  converges  to  some  nonzero  limit  6? 

(c)  Use  (a)  to  prove  Theorem  2.3.3,  part  (iii),  for  the  case  when  a  =  0. 

Exercise  2.3.10.  Consider  the  following  list  of  conjectures.  Provide  a  short 
proof  for  those  that  are  true  and  a  counterexample  for  any  that  are  false. 


(a)  If  lim(an  —  bn)  =  0,  then  liman  =  limfrn. 


(b)  If  (bn)  6,  then  | bn 

(c)  If  (an)  a  and  (bn  —  an)  0,  then  (bn)  a. 

(d)  If  (an)  0  and  \bn  —  b\  <  an  for  all  n  G  N,  then  (bn)  —>  b. 

Exercise  2.3.11  (Cesaro  Means).  (a)  Show  that  if  (xn)  is  a  convergent 

sequence,  then  the  sequence  given  by  the  averages 


Un 


X\  T  X2  T  •  •  •  T  xn 
n 


also  converges  to  the  same  limit. 

(b)  Give  an  example  to  show  that  it  is  possible  for  the  sequence  (yn)  of  aver¬ 
ages  to  converge  even  if  (xn)  does  not. 

Exercise  2.3.12.  A  typical  task  in  analysis  is  to  decipher  whether  a  property 
possessed  by  every  term  in  a  convergent  sequence  is  necessarily  inherited  by 
the  limit.  Assume  (an)  a,  and  determine  the  validity  of  each  claim.  Try  to 
produce  a  counterexample  for  any  that  are  false. 

(a)  If  every  an  is  an  upper  bound  for  a  set  5,  then  a  is  also  an  upper  bound 
for  B. 
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(b)  If  every  an  is  in  the  complement  of  the  interval  (0, 1),  then  a  is  also  in  the 
complement  of  (0, 1). 

(c)  If  every  an  is  rational,  then  a  is  rational. 

Exercise  2.3.13  (Iterated  Limits).  Given  a  doubly  indexed  array  amn  where 
m,  n  E  N,  what  should  limm5rwoo  amn  represent? 

(a)  Let  amn  =  m/(m  +  n)  and  compute  the  iterated  limits 

lim  (  lim  amn )  and  lim  (  lim  amn )  . 

n— >■  oo  Vm— )■  oo  /  m— >oo  \n— >■  oo  / 


Define  lim m,n^oo  & mn  =  a  to  mean  that  for  all  e  >  0  there  exists  an  TV  E  N 


such  that  if  both  ra,  n  >  TV,  then 


a 


mn 


a 


<  e. 


(b)  Let  amn  =  1  /(ra  +  n).  Does  limm?n^00  amn  exist  in  this  case?  Do  the  two 
iterated  limits  exist?  How  do  these  three  values  compare?  Answer  these 
same  questions  for  amn  =  mn/(m2  +  n2). 


(c)  Produce  an  example  where  hmm;n^oo  amn  exists  but  where  neither  iter¬ 
ated  limit  can  be  computed. 


(d)  Assume  limmjn^00  amn  =  a,  and  assume  that  for  each  fixed  m  E  N, 

1  i m n oo  ( a rn n )  ^  5m.  Show  lim^^QQ  n. 

(e)  Prove  that  if  limmjn^00  amn  exists  and  the  iterated  limits  both  exist,  then 
all  three  limits  must  be  equal. 


2.4  The  Monotone  Convergence  Theorem 
and  a  First  Look  at  Infinite  Series 

We  showed  in  Theorem  2.3.2  that  convergent  sequences  are  bounded.  The 
converse  statement  is  certainly  not  true.  It  is  not  too  difficult  to  produce  an 
example  of  a  bounded  sequence  that  does  not  converge.  On  the  other  hand,  if 
a  bounded  sequence  is  monotone ,  then  in  fact  it  does  converge. 

Definition  2.4.1.  A  sequence  (an)  is  increasing  if  an  <  an+i  for  all  n  G  N  and 
decreasing  if  an  >  an+i  for  all  n  G  N.  A  sequence  is  monotone  if  it  is  either 
increasing  or  decreasing. 

Theorem  2.4.2  (Monotone  Convergence  Theorem).  If  a  sequence  is  mono¬ 
tone  and  bounded ,  then  it  converges. 

Proof.  Let  (an)  be  monotone  and  bounded.  To  prove  (an)  converges  using  the 
definition  of  convergence,  we  are  going  to  need  a  candidate  for  the  limit.  Let’s 
assume  the  sequence  is  increasing  (the  decreasing  case  is  handled  similarly) ,  and 
consider  the  set  of  points  {an  :  n  E  N}.  By  assumption,  this  set  is  bounded,  so 
we  can  let 

s  =  sup{an  :  n  E  N}. 

It  seems  reasonable  to  claim  that  liman  =  s. 
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-+ 


sup{an:nGN} 

) - 


s  —  e  s+e 


To  prove  this,  let  e  0.  Because  s  is  the  legist  upper  bonnd  for  {an  '.  n  G  N}, 
s  —  e  is  not  an  upper  bound,  so  there  exists  a  point  in  the  sequence  a at  such 
that  s  —  e  <  aw.  Now,  the  fact  that  (an)  is  increasing  implies  that  if  n  >  TV, 
then  a  at  <  an.  Hence, 


s  —  e  <  a at  <  an  <s<s-fe, 


which  implies  |  an 


s 


<  e,  as  desired. 


□ 


The  Monotone  Convergence  Theorem  is  extremely  useful  for  the  study  of 
infinite  series,  largely  because  it  asserts  the  convergence  of  a  sequence  without 
explicit  mention  of  the  actual  limit.  This  is  a  good  moment  to  do  some  prelimi¬ 
nary  investigations,  so  it  is  time  to  formalize  the  relationship  between  sequences 
and  series. 

Definition  2.4.3  (Convergence  of  a  Series).  Let  (bn)  be  a  sequence.  An 
infinite  series  is  a  formal  expression  of  the  form 


oo 

^  ]  frn  =  fri  T  T  53  T  T  H - * 

n— 1 

We  define  the  corresponding  sequence  of  partial  sums  (sm)  by 


Sm  —  +  '  '  '  +  5m, 


and  say  that  the  series  converges  to  B  if  the  sequence  (sm)  converges 

to  B.  In  this  case,  we  write  £“  1  bn  =  B. 


Example  2.4.4.  Consider 


E 


1 


n 


2  ' 


Because  the  terms  in  the  sum  are  all  positive,  the  sequence  of  partial  sums 
given  by 


1  1 


1 
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is  increasing.  The  question  is  whether  or  not  we  can  find  some  upper  bound  on 
(sm).  To  this  end,  observe 


Sm  —  1  + 


1 


+ 


1 


+ 


1 


+  ••'  + 


1 


2-2  3-3  4-4 

1  1  1 

1  + - + - + - +  •  •  •  + 

2-1  3-2  4-3 


err 


1 


rn(rn  —  1) 


1 

—  1  +  1 - 

m 

<  2. 


+ 


1 


1 


(m  —  1)  m 


Thus,  2  is  an  upper  bound  for  the  sequence  of  partial  sums,  so  by  the  Mono¬ 
tone  Convergence  Theorem,  Xl^LiVn2  converges  to  some  (for  the  moment) 
unknown  limit  less  than  2.  (Finding  the  value  of  this  limit  is  the  subject  of 
Sections  6.1  and  8.3.) 

Example  2.4.5  (Harmonic  Series).  This  time,  consider  the  so-called  har¬ 
monic  series 

oo 

E1- 

n 

n— 1 

Again,  we  have  an  increasing  sequence  of  partial  sums, 

1  1  1 

sm  ~  1  +  -  +  -  +  --  -H  ? 

2  3  m 

that  upon  naive  inspection  appears  as  though  it  may  be  bounded.  However,  2 
is  no  longer  an  upper  bound  because 

„  1  (l  1\  1  (1  1\ 

—  1  +  —  +(  —  +  —  )  +1  +  —  +(  —  +  —  )  —  2. 

2  V3  4/  2  V4  4/ 

A  similar  calculation  shows  that  ss  >  24,  and  we  can  see  that  in  general 


,  1  (1  1^  (1 

Sok  —  1  +  —  +  [  —  +  —  )  +  (  —  + 

2  2  y3  4/  \5 

1  (l  1\  (\ 

>  1+2  +  V4  +  4)  +  (8  + 

=  1  +  5  +  2(i)+4Gl  + 


1  1  1 

1+2 +2 +2 + 

1+,.i 


1 

+  2 


+  +-  + 

+  \  I  H - f 

1 


+  2 


k-1 


2k 


1 


2*-1  +  1 


+  •••  + 


1 

2k 


1 

2 k 


+  •••  + 


1 

2 k 


which  is  unbounded.  Thus,  despite  the  incredibly  slow  pace,  the  sequence  of 
partial  sums  of  Vn  eventually  surpasses  every  number  on  the  positive  real 

line.  Because  convergent  sequences  are  bounded,  the  harmonic  series  diverges. 
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The  previous  example  is  a  special  case  of  a  general  argument  that  can  be 
used  to  determine  the  convergence  or  divergence  of  a  large  class  of  inhnite  series. 

Theorem  2.4.6  (Cauchy  Condensation  Test).  Suppose  (bn)  is  decreasing 
and  satisfies  bn  >  0  for  all  n  E  N.  Then,  the  series  bn  converges  if  and 

only  if  the  series 


oo 

^  ^  2n62™  —  h\  T  252  T  4^4  T  8 bg  T  1  QbiQ  T  •  •  • 

n= 0 


converges. 

Proof.  First,  assume  that  converges.  Theorem  2.3.2  guarantees 

that  the  partial  sums 


tk  —  b\  +  2  b2  -f-  464  +  •  •  •  +  2kb2k 

are  bounded;  that  is,  there  exists  an  M  >  0  such  that  tk  <  M  for  all  k  E  N. 
We  want  to  prove  that  bn  converges.  Because  bn  >  0,  we  know  that  the 

partial  sums  are  increasing,  so  we  only  need  to  show  that 

sm  =  b\  +  b2  +  63  +  •  •  •  +  brn 

is  bounded. 

Fix  m  and  let  k  be  large  enough  to  ensure  m  <  2fc+1  —  1.  Then,  sm  <  s2k+i_i 
and 

s2k+ i_i  =  b\  T-  (b2  -f-  63)  +  (64  H~  ^5  +  H~  67)  +  •  •  •  +  (52fc  +  •  •  •  +  52fc+i_i) 

^  5i  T  {b2  +  62)  H~  (^4  +  64  +  64  +  64)  +  •  •  •  +  (52fc  +  •  •  •  +  52fc) 

=  &i  +  262  -f-  4^4  +  •  •  •  +  2kb2k  =  tk- 

Thus,  Sm  <  tk  <  M,  and  the  sequence  (sm)  is  bounded.  By  the  Monotone 
Convergence  Theorem,  we  can  conclude  that  Y^=i  bn  converges. 

The  proof  that  2n52^  diverges  implies  Y^=i  bn  diverges  is  similar  to 

Example  2.4.5.  The  details  are  requested  in  Exercise  2.4.9.  □ 

Corollary  2.4.7.  The  series  l/nP  converges  if  and  only  if  p  >  1. 

A  rigorous  argument  for  this  corollary  requires  a  few  basic  facts  about  geo¬ 
metric  series.  The  proof  is  requested  in  Exercise  2.7.5  at  the  end  of  Section  2.7 
where  geometric  series  are  discussed. 

Exercises 

Exercise  2.4.1.  (a)  Prove  that  the  sequence  defined  by  x\  —  3  and 

1 


converges. 
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(b)  Now  that  we  know  limxn  exists,  explain  why  limxn+i  must  also  exist  and 
equal  the  same  value. 

(c)  Take  the  limit  of  each  side  of  the  recursive  equation  in  part  (a)  to  explicitly 
compute  limxn. 

Exercise  2.4.2.  (a)  Consider  the  recursively  defined  sequence  yi  =  1, 

Un+ 1  =  3  yn , 

and  set  y  =  lim yn.  Because  ( yn )  and  (yn+ 1)  have  the  same  limit,  taking 
the  limit  across  the  recursive  equation  gives  y  =  3  —  y.  Solving  for  y,  we 
conclude  lim  yn  =  3/2. 

What  is  wrong  with  this  argument? 

(b)  This  time  set  yi  =  1  and  yn+i  =  3 — — .  Can  the  strategy  in  (a)  be  applied 

x  '  Vn  v  ' 

to  compute  the  limit  of  this  sequence? 

Exercise  2.4.3.  (a)  Show  that 


converges  and  find  the  limit. 

(b)  Does  the  sequence 

V2,  \J 2\/2,  \j 2\/ 2\/2, . . . 

converge?  If  so,  find  the  limit. 

Exercise  2.4.4.  (a)  In  Section  1.4  we  used  the  Axiom  of  Completeness  (AoC) 

to  prove  the  Archimedean  Property  of  R  (Theorem  1.4.2).  Show  that  the 
Monotone  Convergence  Theorem  can  also  be  used  to  prove  the  Archimedean 
Property  without  making  any  use  of  AoC. 

(b)  Use  the  Monotone  Convergence  Theorem  to  supply  a  proof  for  the  Nested 
Interval  Property  (Theorem  1.4.1)  that  doesn’t  make  use  of  AoC. 

These  two  results  suggest  that  we  could  have  used  the  Monotone  Con¬ 
vergence  Theorem  in  place  of  AoC  as  our  starting  axiom  for  building  a 
proper  theory  of  the  real  numbers. 

Exercise  2.4.5  (Calculating  Square  Roots).  Let  x\  =  2,  and  define 


1 

xnJr  1  —  2 


Show  that  x 2n  is  always  greater  than  or  equal  to  2,  and  then  use  this  to 
prove  that  xn  —  xn+i  >  0.  Conclude  that  limxn  =  y/2. 
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(b)  Modify  the  sequence  (xn)  so  that  it  converges  to  yfc. 

Exercise  2.4.6  (Arithmetic— Geometric  Mean).  (a)  Explain  why  ^Jxy  < 

(x  +  y)/ 2  for  any  two  positive  real  numbers  x  and  y.  (The  geometric  mean 
is  always  less  than  the  arithmetic  mean.) 

(b)  Now  let  0  <  x\  <  yi  and  define 

/ -  i  %n  T  Vn 

^n+i  —  v^n?/n  and  yn-\- 1  —  ^  • 

Show  lim  xn  and  lim  yn  both  exist  and  are  equal. 

Exercise  2.4.7  (Limit  Superior).  Let  (an)  be  a  bounded  sequence. 

(a)  Prove  that  the  sequence  defined  by  yn  =  sup{a/c  :  k  >  n}  converges. 

(b)  The  limit  superior  of  (an),  or  lim  sup  an,  is  defined  by 

lim  sup  an  =  lim  yn , 

where  yn  is  the  sequence  from  part  (a)  of  this  exercise.  Provide  a  reason¬ 
able  definition  for  lim  inf  an  and  briefly  explain  why  it  always  exists  for 
any  bounded  sequence. 

(c)  Prove  that  lim  inf  an  <  lim  sup  an  for  every  bounded  sequence,  and  give 
an  example  of  a  sequence  for  which  the  inequality  is  strict. 

(d)  Show  that  lim  inf  an  =  lim  sup  an  if  and  only  if  liman  exists.  In  this  case, 
all  three  share  the  same  value. 


Exercise  2.4.8.  For  each  series,  find  an  explicit  formula  for  the  sequence  of 
partial  sums  and  determine  if  the  series  converges. 


oo 


1 


oo 


<»>  £  ~T  (•>)  £ 


1 


OO 


n= 1 


— ;  n(n  +  1) 

n— 1  v  7 


(c)  loS 


n  +  1 


n= 1 


n 


(In  (c),  log(x)  refers  to  the  natural  logarithm  function  from  calculus.) 

Exercise  2.4.9.  Complete  the  proof  of  Theorem  2.4.6  by  showing  that  if  the 
series  diverges,  then  so  does  l^n-  Example  2.4.5  may  be  a 

useful  reference. 


Exercise  2.4.10  (Infinite  Products).  A  close  relative  of  infinite  series  is  the 
infinite  product 

oo 

JJ  bn  =  bib2bs  •  •  • 
n— 1 

which  is  understood  in  terms  of  its  sequence  of  partial  products 

m 

Pm  =  Ipn  =  bib2bz  ■■■bm. 

n— 1 
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Consider  the  special  class  of  infinite  products  of  the  form 

oo 

(1  +  CLn)  =  (1  +  ai)(l  +  a2)(  1  +  a3)  *  "  ,  where  an  >  0. 

n=l 

(a)  Find  an  explicit  formula  for  the  sequence  of  partial  products  in  the  case 
where  an  =  1/n  and  decide  whether  the  sequence  converges.  Write  out 
the  first  few  terms  in  the  sequence  of  partial  products  in  the  case  where 
an  =  1/n2  and  make  a  conjecture  about  the  convergence  of  this  sequence. 

(b)  Show,  in  general,  that  the  sequence  of  partial  products  converges  if  and 
only  if  Y.n=  an  converges.  (The  inequality  1  +  x  <  3X  for  positive  x  will 
be  useful  in  one  direction.) 


2.5  Subsequences  and  the  Bolzano— Weierstrass 
Theorem 


In  Example  2.4.5,  we  showed  that  the  sequence  of  partial  sums  (sm)  of  the 
harmonic  series  does  not  converge  by  focusing  our  attention  on  a  particular 
subsequence  (s2k)  of  the  original  sequence.  For  the  moment,  we  will  put  the 
topic  of  infinite  series  aside  and  more  fully  develop  the  important  concept  of 
subsequences. 

Definition  2.5.1.  Let  (an)  be  a  sequence  of  real  numbers,  and  let  n\  <  < 

<  U4  <  ns  <  . . .  be  an  increasing  sequence  of  natural  numbers.  Then  the 
sequence 

(am  i  ^ri2  5  ^n3  5  ^n,4  ?  5  5  •  •  •) 

is  called  a  subsequence  of  (an)  and  is  denoted  by  (anfc),  where  k  E  N  indexes 
the  subsequence. 

Notice  that  the  order  of  the  terms  in  a  subsequence  is  the  same  as  in  the 
original  sequence,  and  repetitions  are  not  allowed.  Thus  if 


11111 

2’  3’  4’  5’  6’’ 


5 


then 


n  1  1  1  \ 

y  2  ’  4  ’  6  ’  8  ’  y 


and 


/  1  1  1  1  \ 

V To’ Too’  iooo’  ioooo’" ') 


are  examples  of  legitimate  subsequences,  whereas 


11111  1 
10’ 5’ 100’ 50’ 1000’ 500 


and 


1  1  1 

,1;  3  ’  3  ’  5’ 


are  not. 
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Theorem  2.5.2.  Subsequences  of  a  convergent  sequence  converge  to  the  same 
limit  as  the  original  sequence. 


Proof.  Assume  (an)  a,  and  let  (ank)  be  a  subsequence.  Given  e  >  0,  there 

n  —  cl  <  e  whenever  n  >  N.  Because  n &  >  k  for  all  fc, 


exists  N  such  that  |  a 
the  same  N  will  suffice  for  the  subsequence;  that  is, 
k  >  N. 


a 


nk 


—  a 


<  e  whenever 

□ 


This  not  too  surprising  result  has  several  somewhat  surprising  applications. 
It  is  the  key  ingredient  for  understanding  when  infinite  sums  are  associative 
(Exercise  2.5.3).  We  can  also  use  it  in  the  following  clever  way  to  compute 
values  of  some  familiar  limits. 

Example  2.5.3.  Let  0  <  b  <  1.  Because 


b  >  b2  >  b3  >  b4  >  •  •  •  >  0, 


the  sequence  ( bn )  is  decreasing  and  bounded  below.  The  Monotone  Convergence 
Theorem  allows  us  to  conclude  that  ( bn )  converges  to  some  l  satisfying  b  >  l  >  0. 
To  compute  /,  notice  that  ( b2n )  is  a  subsequence,  so  ( b2n )  l  by  Theorem  2.5.2. 
But  b2n  =  bn  ■bn1  so  by  the  Algebraic  Limit  Theorem,  (62n)  l  T  =  l2 .  Because 
limits  are  unique  (Theorem  2.2.7),  l2  =  /,  and  thus  l  =  0. 

Without  much  trouble  (Exercise  2.5.7),  we  can  generalize  this  example  to 
conclude  ( bn )  — ^  0  if  and  only  if  —  1  <  b  <  1. 

Example  2.5.4  (Divergence  Criterion).  Theorem  2.5.2  is  also  useful  for 
providing  economical  proofs  for  divergence.  In  Example  2.2.8,  we  were  quite 
sure  that 


f  1111111111111  \ 

V,“2,3’_4’5’_5’ 5,_5’5’_5’5,_5’5,_5’"'y 


did  not  converge  to  any  proposed  limit.  Notice  that 


11111 


5  5  5  5 


is  a  subsequence  that  converges  to  1/5.  Also, 


is  a  different  subsequence  of  the  original  sequence  that  converges  to  —1/5. 
Because  we  have  two  subsequences  converging  to  two  different  limits,  we  can 
rigorously  conclude  that  the  original  sequence  diverges. 
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The  Bolzano— Weierstrass  Theorem 

In  the  previous  example,  it  was  rather  easy  to  spot  a  convergent  subsequence 
(or  two)  hiding  in  the  original  sequence.  For  bounded  sequences,  it  turns  out 
that  it  is  always  possible  to  find  at  least  one  such  convergent  subsequence. 

Theorem  2.5.5  (Bolzano— Weierstrass  Theorem).  Every  bounded  sequence 
contains  a  convergent  subsequence. 


Proof.  Let  (an)  be  a  bounded  sequence  so  that  there  exists  M  >  0  satisfying 


\a 


<  M  for  all  n  E  N.  Bisect  the  closed  interval  [— M,  M]  into  the  two  closed 
intervals  [— M,  0]  and  [0,  M].  (The  midpoint  is  included  in  both  halves.)  Now,  it 
must  be  that  at  least  one  of  these  closed  intervals  contains  an  infinite  number  of 
the  terms  in  the  sequence  (an).  Select  a  half  for  which  this  is  the  case  and  label 
that  interval  as  I\.  Then,  let  ani  be  some  term  in  the  sequence  (an)  satisfying 


a 


m 


e  h 


II 


O'n  2 


S 


-M 


V 


h 


0 


M 


Next,  we  bisect  Ii  into  closed  intervals  of  equal  length,  and  let  I2  be  a  half 
that  again  contains  an  infinite  number  of  terms  of  the  original  sequence.  Because 
there  are  an  infinite  number  of  terms  from  (an)  to  choose  from,  we  can  select 
an  an2  from  the  original  sequence  with  n2  >  n\  and  an2  E  h-  In  general,  we 
construct  the  closed  interval  Ik  by  taking  a  half  of  Ik- 1  containing  an  infinite 
number  of  terms  of  (an)  and  then  select  nk  >  Uk- 1  >  •  •  •  >  U2  >  n±  so  that 
^  Ik- 

We  want  to  argue  that  (anfe)  is  a  convergent  subsequence,  but  we  need  a 
candidate  for  the  limit.  The  sets 


h  2  h  2  h  2  •  •  • 


form  a  nested  sequence  of  closed  intervals,  and  by  the  Nested  Interval  Property 
there  exists  at  least  one  point  x  G  R  contained  in  every  /&.  This  provides  us 
with  the  candidate  we  were  looking  for.  It  just  remains  to  show  that  (anfc)  x. 

Let  e  >  0.  By  construction,  the  length  of  Ik  is  M( l/2)k~1  which  converges 
to  zero.  (This  follows  from  Example  2.5.3  and  the  Algebraic  Limit  Theorem.) 
Choose  N  so  that  k  >  N  implies  that  the  length  of  Ik  is  less  than  e.  Because  x 
and  an.  are  both  in  /&,  it  follows  that  a 


ni e 


x\  <  e. 


□ 
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Exercises 

Exercise  2.5.1.  Give  an  example  of  each  of  the  following,  or  argue  that  such 
a  request  is  impossible. 

(a)  A  sequence  that  has  a  subsequence  that  is  bounded  but  contains  no  sub¬ 
sequence  that  converges. 

(b)  A  sequence  that  does  not  contain  0  or  1  as  a  term  but  contains  subse¬ 
quences  converging  to  each  of  these  values. 

(c)  A  sequence  that  contains  subsequences  converging  to  every  point  in  the 
infinite  set  {1, 1/2, 1/3, 1/4, 1/5, . . .}. 

(d)  A  sequence  that  contains  subsequences  converging  to  every  point  in  the 
infinite  set  {1, 1/2, 1/3, 1/4, 1/5,...},  and  no  subsequences  converging  to 
points  outside  of  this  set. 

Exercise  2.5.2.  Decide  whether  the  following  propositions  are  true  or  false, 
providing  a  short  justification  for  each  conclusion. 

(a)  If  every  proper  subsequence  of  (xn)  converges,  then  (xn)  converges  as  well. 

(b)  If  (xn)  contains  a  divergent  subsequence,  then  (xn)  diverges. 

(c)  If  (xn)  is  bounded  and  diverges,  then  there  exist  two  subsequences  of  (xn) 
that  converge  to  different  limits. 

(d)  if  (xn)  is  monotone  and  contains  a  convergent  subsequence,  then  (xn) 
converges. 

Exercise  2.5.3.  (a)  Prove  that  if  an  infinite  series  converges,  then  the  asso¬ 

ciative  property  holds.  Assume  a\  +  <22  +  as  +  <14  +  <25  +  •  •  •  converges  to 
a  limit  L  (i.e.,  the  sequence  of  partial  sums  (sn)  L).  Show  that  any 
regrouping  of  the  terms 

(al  T  CL  2  +  *  *  *  +  CLni)  +  (&7U  +  1  +  •  •  •  +  CLn2)  +  (ttn2  + 1  +  '  *  *  +  an3)  +  '  *  ’ 

leads  to  a  series  that  also  converges  to  L. 

(b)  Compare  this  result  to  the  example  discussed  at  the  end  of  Section  2.1 
where  infinite  addition  was  shown  not  to  be  associative.  Why  doesn’t  our 
proof  in  (a)  apply  to  this  example? 

Exercise  2.5.4.  The  Bolzano-Weierstrass  Theorem  is  extremely  important, 
and  so  is  the  strategy  employed  in  the  proof.  To  gain  some  more  experience 
with  this  technique,  assume  the  Nested  Interval  Property  is  true  and  use  it 
to  provide  a  proof  of  the  Axiom  of  Completeness.  To  prevent  the  argument 
from  being  circular,  assume  also  that  (l/2n)  0.  (Why  precisely  is  this  last 

assumption  needed  to  avoid  circularity?) 
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Exercise  2.5.5.  Assume  (an)  is  a  bounded  sequence  with  the  property  that 
every  convergent  subsequence  of  (an)  converges  to  the  same  limit  a  £  R.  Show 
that  (an)  must  converge  to  a. 


Exercise  2.5.6.  Use  a  similar  strategy  to  the  one  in  Example  2.5.3  to  show 
lim  b1'71  exists  for  all  b  >  0  and  find  the  value  of  the  limit.  (The  results  in 
Exercise  2.3.1  may  be  assumed.) 


Exercise  2.5.7.  Extend  the  result  proved  in  Example  2.5.3  to  the  case  \b\  <  1: 
that  is,  show  lim(6n)  =  0  if  and  only  if  —  1  <  b  <  1. 


Exercise  2.5.8.  Another  way  to  prove  the  Bolzano- Weierstr ass  Theorem  is  to 
show  that  every  sequence  contains  a  monotone  subsequence.  A  useful  device  in 
this  endeavor  is  the  notion  of  a  peak  term.  Given  a  sequence  (xn),  a  particular 
term  is  a  peak  term  if  no  later  term  in  the  sequence  exceeds  it;  i.e.,  if 
xrn  >  xn  for  all  n  >  rn. 


(a)  Find  examples  of  sequences  with  zero,  one,  and  two  peak  terms.  Find 
an  example  of  a  sequence  with  infinitely  many  peak  terms  that  is  not 
monotone. 

(b)  Show  that  every  sequence  contains  a  monotone  subsequence  and  explain 
how  this  furnishes  a  new  proof  of  the  Bolzano- Weierstrass  Theorem. 

Exercise  2.5.9.  Let  (an)  be  a  bounded  sequence,  and  define  the  set 


S  =  {x  £  R  :  x  <  an  for  infinitely  many  terms  an}. 

Show  that  there  exists  a  subsequence  (ank)  converging  to  s  =  sup  S'.  (This  is  a 
direct  proof  of  the  Bolzano-Weierstrass  Theorem  using  the  Axiom  of 
Completeness.) 


2.6  The  Cauchy  Criterion 


The  following  definition  bears  a  striking  resemblance  to  the  definition  of  con¬ 
vergence  for  a  sequence. 


Definition  2.6.1.  A  sequence  (an)  is  called  a  Cauchy  sequence  if,  for  every 
e  >  0,  there  exists  an  TV  £  N  such  that  whenever  m,  n  >  N  it  follows  that 


an  —  a 


m 


<  e. 


To  make  the  comparison  easier,  let’s  restate  the  definition  of  convergence. 


Definition  2.2.3.  A  sequence  (an)  converges  to  a  real  number  a  if,  for  every 
e  >  0,  there  exists  an  N  £  N  such  that  whenever  n  >  N  it  follows  that 


a 


n 


a 


<  e. 


As  we  have  discussed,  the  definition  of  convergence  asserts  that,  given  an 
arbitrary  positive  e,  it  is  possible  to  find  a  point  in  the  sequence  after  which 
the  terms  of  the  sequence  are  all  closer  to  the  limit  a  than  the  given  e.  On  the 
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other  hand,  a  sequence  is  a  Cauchy  sequence  if,  for  every  e,  there  is  a  point 
in  the  sequence  after  which  the  terms  are  all  closer  to  each  other  than  the 
given  e.  To  spoil  the  surprise,  we  will  argue  in  this  section  that  in  fact  these 
two  definitions  are  equivalent:  Convergent  sequences  are  Cauchy  sequences, 
and  Cauchy  sequences  converge.  The  significance  of  the  definition  of  a  Cauchy 
sequence  is  that  there  is  no  mention  of  a  limit.  This  is  somewhat  like  the 
situation  with  the  Monotone  Convergence  Theorem  in  that  we  will  have  another 
way  of  proving  that  sequences  converge  without  having  any  explicit  knowledge 
of  what  the  limit  might  be. 

Theorem  2.6.2.  Every  convergent  sequence  is  a  Cauchy  sequence. 

Proof.  Assume  (xn)  converges  to  x.  To  prove  that  (xn)  is  Cauchy,  we  must 
find  a  point  in  the  sequence  after  which  we  have  \xn  —  xm\  <  e.  This  can  be 
done  using  an  application  of  the  triangle  inequality.  The  details  are  requested 
in  Exercise  2.6.1.  □ 


The  converse  is  a  bit  more  difficult  to  prove,  mainly  because,  in  order  to  prove 
that  a  sequence  converges,  we  must  have  a  proposed  limit  for  the  sequence  to 
approach.  We  have  been  in  this  situation  before  in  the  proofs  of  the  Monotone 
Convergence  Theorem  and  the  Bolzano-Weierstrass  Theorem.  Our  strategy 
here  will  be  to  use  the  Bolzano-Weierstrass  Theorem.  This  is  the  reason  for  the 
next  lemma.  (Compare  this  with  Theorem  2.3.2.) 

Lemma  2.6.3.  Cauchy  sequences  are  bounded. 


Proof.  Given  e  =  1,  there  exists  an  N  such  that 


x 


m 


X 


n 


<  1  for  all  m,  n  >  N. 


Thus,  we  must  have  xn  <  xn  +  1  for  all  n  >  N.  It  follows  that 


M  =  max{  \x± 


xn- i  | ,  \%n  |  +  1} 


is  a  bound  for  the  sequence  (xn).  □ 

Theorem  2.6.4  (Cauchy  Criterion).  A  sequence  converges  if  and  only  if  it 
is  a  Cauchy  sequence. 

Proof.  (=>)  This  direction  is  Theorem  2.6.2. 

(4=)  For  this  direction,  we  start  with  a  Cauchy  sequence  (xn).  Lemma  2.6.3 
guarantees  that  (xn)  is  bounded,  so  we  may  use  the  Bolzano-Weierstrass  The¬ 
orem  to  produce  a  convergent  subsequence  (xUk).  Set 


x  =  lim  xnk . 

The  idea  is  to  show  that  the  original  sequence  (xn)  converges  to  this  same  limit. 
Once  again,  we  will  use  a  triangle  inequality  argument.  We  know  the  terms 
in  the  subsequence  are  getting  close  to  the  limit  x,  and  the  assumption  that 
(xn)  is  Cauchy  implies  the  terms  in  the  “tail”  of  the  sequence  are  close  to  each 
other.  Thus,  we  want  to  make  each  of  these  distances  less  than  half  of  the 
prescribed  e. 
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Let  e  >  0.  Because  (xn)  is  Cauchy,  there  exists  N  such  that 


rp  _  rp 


whenever  m,n  >  N .  Now,  we  also  know  that  (xnk)  — x,  so  choose  a  term  in 
this  subsequence,  call  it  xnK ,  with  uk  >  N  and 


xtik 


To  see  that  N  has  the  desired  property  (for  the  original  sequence  (xn)),  observe 
that  if  n  >  N,  then 


rp  _  rp 

*Aj  *Aj 

— 

rp  rp 

^ n 

+  xnK  ~  x 

< 

rp  rp 

^ n  ^ riK 

e  e 

+ 

rp  _  rp 

n  k 

< 

—  T  ~  =  e 

□ 


The  Cauchy  Criterion  is  named  after  the  French  mathematician  Augustin 
Louis  Cauchy.  Cauchy  is  a  major  figure  in  the  history  of  many  branches  of 
mathematics — number  theory  and  the  theory  of  finite  groups,  to  name  a  few — 
but  he  is  most  widely  recognized  for  his  enormous  contributions  in  analysis, 
especially  complex  analysis.  He  is  deservedly  credited  with  inventing  the  e- 
based  definition  of  limits  we  use  today,  although  it  is  probably  better  to  view 
him  as  a  pioneer  of  analysis  in  the  sense  that  his  work  did  not  attain  the  level 
of  refinement  that  modern  mathematicians  have  come  to  expect.  The  Cauchy 
Criterion,  for  instance,  was  devised  and  used  by  Cauchy  to  study  infinite  series, 
but  he  never  actually  proved  it  in  both  directions.  The  fact  that  there  were 
gaps  in  Cauchy’s  work  should  not  diminish  his  brilliance  in  any  way.  The 
issues  of  the  day  were  both  difficult  and  subtle,  and  Cauchy  was  far  and  away 
the  most  influential  in  laying  the  groundwork  for  modern  standards  of  rigor. 
Karl  Weierstrass  played  a  major  role  in  sharpening  Cauchy’s  arguments.  We 
will  hear  a  good  deal  more  from  Weierstrass,  most  notably  in  Chapter  6  when 
we  take  up  uniform  convergence.  Bernhard  Bolzano  was  working  in  Prague 
and  was  writing  and  thinking  about  many  of  these  same  issues  surrounding 
limits  and  continuity.  Because  his  work  was  not  widely  available  to  the  rest 
of  the  mathematical  community,  his  historical  reputation  never  achieved  the 
distinction  that  his  impressive  accomplishments  would  seem  to  merit. 


Completeness  Revisited 

In  the  first  chapter,  we  established  the  Axiom  of  Completeness  (AoC)  to  be  the 
assertion  that  nonempty  sets  bounded  above  have  least  upper  bounds.  We  then 
used  this  axiom  as  the  crucial  step  in  the  proof  of  the  Nested  Interval  Property 
(NIP).  In  this  chapter,  AoC  was  the  central  step  in  the  Monotone  Convergence 
Theorem  (MCT),  and  NIP  was  the  key  to  proving  the  Bolzano- Weierstrass 
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Theorem  (BW).  Finally,  we  needed  BW  in  our  proof  of  the  Cauchy  Criterion 
(CC)  for  convergent  sequences.  The  list  of  implications  then  looks  like 


AoC  =>* 


NIP 

MCT. 


BW  =>  CC. 


But  this  one-directional  list  is  not  the  whole  story.  Recall  that  in  our  original 
discussions  about  completeness,  the  fundamental  problem  was  that  the  rational 
numbers  contained  “gaps.”  The  reason  for  moving  from  the  rational  numbers 
to  the  real  numbers  to  do  analysis  is  so  that  when  we  encounter  a  sequence  that 
looks  as  if  it  is  converging  to  some  number — say  \pl — then  we  can  be  assured 
that  there  is  indeed  a  number  there  that  we  can  call  the  limit.  The  assertion 
that  “nonempty  sets  bounded  above  have  least  upper  bounds”  is  simply  one 
way  to  mathematically  articulate  our  insistence  that  there  be  no  “holes”  in  our 
ordered  field,  but  it  is  not  the  only  way.  Instead,  we  could  have  taken  MCT  to 
be  our  defining  axiom  and  used  it  to  prove  NIP  and  the  existence  of  least  upper 
bounds.  This  is  the  content  of  Exercise  2.4.4. 

How  about  NIP?  Could  this  property  serve  as  a  starting  point  for  a  proper 
axiomatic  treatment  of  the  real  numbers?  Almost.  In  Exercise  2.5.4  we  showed 
that  NIP  implies  AoC,  but  to  prevent  the  argument  from  making  implicit  use 
of  AoC  we  needed  an  extra  assumption  that  is  equivalent  to  the  Archimedean 
Property  (Theorem  1.4.2).  This  extra  hypothesis  is  unavoidable.  Whereas  AoC 
and  MCT  can  both  be  used  to  prove  that  N  is  not  a  bounded  subset  of  R,  there 
is  no  way  to  prove  this  same  fact  starting  from  NIP.  The  upshot  is  that  NIP 
is  a  perfectly  reasonable  candidate  to  use  as  the  fundamental  axiom  of  the  real 
numbers  provided  that  we  also  include  the  Archimedean  Property  as  a  second 
unproven  assumption. 

In  fact,  if  we  assume  the  Archimedean  Property  holds,  then  AoC,  NIP,  MCT, 
BW,  and  CC  are  equivalent  in  the  sense  that  once  we  take  any  one  of  them  to 
be  true,  it  is  possible  to  derive  the  other  four.  However,  because  we  have  an 
example  of  an  ordered  field  that  is  not  complete — namely,  the  set  of  rational 
numbers — we  know  it  is  impossible  to  prove  any  of  them  using  only  the  field 
and  order  properties.  Just  how  we  decide  which  should  be  the  axiom  and  which 
then  become  theorems  depends  largely  on  preference  and  context,  and  in  the 
end  is  not  especially  significant.  What  is  important  is  that  we  understand  all  of 
these  results  as  belonging  to  the  same  family,  each  asserting  the  completeness 
of  R  in  its  own  particular  language. 

One  loose  end  in  this  conversation  is  the  curious  and  somewhat  unpredictable 
relationship  of  the  Archimedean  Property  to  these  other  results.  As  we  have 
mentioned,  the  Archimedean  Property  follows  as  a  consequence  of  AoC  as  well 
as  MCT,  but  not  from  NIP.  Starting  from  BW,  it  is  possible  to  prove  MCT  and 
thus  also  the  Archimedean  Property.  On  the  other  hand,  the  Cauchy  Criterion 
is  like  NIP  in  that  it  cannot  be  used  on  its  own  to  prove  the  Archimedean 
Property.1 

1 A  thorough  account  of  the  logical  dependence  between  these  various  results  can  be  found 
in  [23]. 
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Exercises 

Exercise  2.6.1.  Supply  a  proof  for  Theorem  2.6.2. 

Exercise  2.6.2.  Give  an  example  of  each  of  the  following,  or  argue  that  such 
a  request  is  impossible. 

(a)  A  Cauchy  sequence  that  is  not  monotone. 

(b)  A  Cauchy  sequence  with  an  unbounded  subsequence. 

(c)  A  divergent  monotone  sequence  with  a  Cauchy  subsequence. 

(d)  An  unbounded  sequence  containing  a  subsequence  that  is  Cauchy. 

Exercise  2.6.3.  If  (xn)  and  (yn)  are  Cauchy  sequences,  then  one  easy  way 
to  prove  that  (xn  +  yn)  is  Cauchy  is  to  use  the  Cauchy  Criterion.  By  Theo¬ 
rem  2.6.4,  (xn)  and  (yn)  must  be  convergent,  and  the  Algebraic  Limit  Theorem 
then  implies  (xn  +  yn)  is  convergent  and  hence  Cauchy. 

(a)  Give  a  direct  argument  that  (xn  +  yn)  is  a  Cauchy  sequence  that  does  not 
use  the  Cauchy  Criterion  or  the  Algebraic  Limit  Theorem. 


(b)  Do  the  same  for  the  product  (xnyn). 

Exercise  2.6.4.  Let  (an)  and  (bn)  be  Cauchy  sequences.  Decide  whether  each 
of  the  following  sequences  is  a  Cauchy  sequence,  justifying  each  conclusion. 


(a)  cn  —  an  bn 


(b)  cn  =  (  — l)nan 


(c)  cn 

x. 


a 


n 


,  where  [\x]\  refers  to  the  greatest  integer  less  than  or  equal  to 


Exercise  2.6.5.  Consider  the  following  (invented)  definition:  A  sequence  (sn) 
is  pseudo- Cauchy  if,  for  all  e  >  0,  there  exists  an  N  such  that  if  n  >  TV,  then 

Sn+l  Sn  |  ^ 

Decide  which  one  of  the  following  two  propositions  is  actually  true.  Supply 
a  proof  for  the  valid  statement  and  a  counterexample  for  the  other. 

(i)  Pseudo-Cauchy  sequences  are  bounded. 

(ii)  If  (xn)  and  (yn)  are  pseudo- Cauchy,  then  (xn  +  yn)  is  pseudo-Cauchy  as 
well. 


Exercise  2.6.6.  Let’s  call  a  sequence  (an)  quasi-increasing  if  for  all  e  >  0  there 
exists  an  N  such  that  whenever  n  >  m  >  TV  it  follows  that  an  >  am  —  e. 

(a)  Give  an  example  of  a  sequence  that  is  quasi-increasing  but  not  monotone 
or  eventually  monotone. 
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(b)  Give  an  example  of  a  quasi-increasing  sequence  that  is  divergent  and  not 
monotone  or  eventually  monotone. 

(c)  Is  there  an  analogue  of  the  Monotone  Convergence  Theorem  for  quasi- 
increasing  sequences?  Give  an  example  of  a  bounded,  quasi-increasing 
sequence  that  doesn’t  converge,  or  prove  that  no  such  sequence  exists. 

Exercise  2.6.7.  Exercises  2.4.4  and  2.5.4  establish  the  equivalence  of  the  Axiom 
of  Completeness  and  the  Monotone  Convergence  Theorem.  They  also  show  the 
Nested  Interval  Property  is  equivalent  to  these  other  two  in  the  presence  of  the 
Archimedean  Property. 

(a)  Assume  the  Bolzano- Weierstrass  Theorem  is  true  and  use  it  to  construct  a 
proof  of  the  Monotone  Convergence  Theorem  without  making  any  appeal 
to  the  Archimedean  Property.  This  shows  that  BW,  AoC,  and  MCT  are 
all  equivalent. 

(b)  Use  the  Cauchy  Criterion  to  prove  the  Bolzano- Weierstrass  Theorem,  and 
find  the  point  in  the  argument  where  the  Archimedean  Property  is  implic¬ 
itly  required.  This  establishes  the  final  link  in  the  equivalence  of  the  five 
characterizations  of  completeness  discussed  at  the  end  of  Section  2.6. 

(c)  How  do  we  know  it  is  impossible  to  prove  the  Axiom  of  Completeness 
starting  from  the  Archimedean  Property? 

2.7  Properties  of  Infinite  Series 

Given  an  infinite  series  if  is  important  to  keep  a  clear  distinction 

between 

(i)  the  sequence  of  terms :  (ai,  &2,  a3,  •  •  •)  and 

(ii)  the  sequence  of  partial  sums:  (si,  S2?  53,  •  •  •),  where  sn  =  a\  +<22  +  •  •  •  +  an. 

The  convergence  of  the  series  YlkLi  ak  is  defined  in  terms  of  the  sequence  (sn). 
Specifically,  the  statement 

00 

EA  =  A  means  that  lim  sn  =  A. 
k= 1 

It  is  for  this  reason  that  we  can  immediately  translate  many  of  our  results  from 
the  study  of  sequences  into  statements  about  the  behavior  of  infinite  series. 

Theorem  2.7.1  (Algebraic  Limit  Theorem  for  Series).  IfY^k=iak  =  A 
and  YlkLi  bk  =  B,  then 

(i)  Y^k=i  cak  —  for  all  c  G  R  and 

(ii)  ^2k=i(ak  bk)  =  A -\-  B. 
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Proof,  (i)  In  order  to  show  that  Y^kLi  cak  =  cA,  we  must  argue  that  the 
sequence  of  partial  sums 

tm  =  C&l  T  CCI2  +  CU3  +  •  •  *  +  CUm 

converges  to  cA.  But  we  are  given  that  YlkLi  ak  converges  to  A,  meaning  that 
the  partial  sums 

Srn  =  CL\  +  (22  T  T  •  '  '  H-  0jTn 

converge  to  A.  Because  tm  =  csm,  applying  the  Algebraic  Limit  Theorem  for 
sequences  (Theorem  2.3.3)  yields  (tm)  — cA ,  as  desired. 

The  proof  of  part  (ii)  is  analogous  and  is  left  as  an  unofficial  exercise.  □ 

One  way  to  summarize  Theorem  2.7.1  (i)  is  to  say  that  infinite  addition  still 
satisfies  the  distributive  property.  Part  (ii)  verifies  that  series  can  be  added  in 
the  usual  way.  Missing  from  this  theorem  is  any  statement  about  the  product  of 
two  infinite  series.  At  the  heart  of  this  question  is  the  issue  of  commutativity, 
which  requires  a  more  delicate  analysis  and  so  is  postponed  until  Section  2.8. 

Theorem  2.7.2  (Cauchy  Criterion  for  Series).  The  series  J2T=iak  con~ 
verges  if  and  only  if,  given  e  >  0,  there  exists  an  N  E  N  such  that  whenever 
n  >  m  >  N  it  follows  that 


am+ 1  +  2  +  '  ‘  ‘  +  Un 


<  e. 


Proof.  Observe  that 


sn  s 


m 


&m+ 1  H~  Clm+ 2  +  •  ‘  ‘  +  CL 


n 


and  apply  the  Cauchy  Criterion  for  sequences. 


□ 


The  Cauchy  Criterion  leads  to  economical  proofs  of  several  basic  facts  about 
series. 


Theorem  2.7.3.  If  the  series  YlkLi  ak  converges,  then  ( ak )  0. 

Proof.  Consider  the  special  case  n  =  m  +  1  in  the  Cauchy  Criterion  for  Series. 

□ 


Every  statement  of  this  result  should  be  accompanied  with  a  reminder  to 
look  at  the  harmonic  series  (Example  2.4.5)  to  erase  any  misconception  that  the 
converse  statement  is  true.  Knowing  (ak)  tends  to  0  does  not  imply  that  the 
series  converges. 

Theorem  2.7.4  (Comparison  Test).  Assume  (ak)  and  (bk)  are  sequences 
satisfying  0  <  <  bk  for  all  k  €  N. 

«  IfEZth  converges,  then  ak  converges. 

0)  VJ2Z  ;1  ak  diverges,  then  1  & k  diverges. 
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Proof.  Both  statements  follow  immediately  from  the  Cauchy  Criterion  for  Series 
and  the  observation  that 


&ra+ 1  +  CLm+2  +  ’  '  *  +  CLn 


< 


bm+ 1  +  frm+2  +  ‘  •  •  +  bn\. 


Alternate  proofs  using  the  Monotone  Convergence  Theorem  are  requested  in 
the  exercises.  □ 


This  is  a  good  point  to  remind  ourselves  again  that  statements  about  con¬ 
vergence  of  sequences  and  series  are  immune  to  changes  in  some  finite  number 
of  initial  terms.  In  the  Comparison  Test,  the  requirement  that  0  <  ctk  <  bk 
does  not  really  need  to  hold  for  all  k  G  N  but  just  needs  to  be  eventually  true. 
A  weaker,  but  sufficient,  hypothesis  would  be  to  assume  that  there  exists  some 
point  Mg  N  such  that  the  inequality  <  bk  is  true  for  all  k  >  M . 

The  Comparison  Test  is  used  to  deduce  the  convergence  or  divergence  of  one 
series  based  on  the  behavior  of  another.  Thus,  for  this  test  to  be  of  any  great 
use,  we  need  a  catalog  of  series  we  can  use  as  measuring  sticks.  In  Section  2.4, 
we  proved  the  Cauchy  Condensation  Test,  which  led  to  the  general  statement 
that  the  series  1/ nP  converges  if  and  only  if  p  >  1. 

The  next  example  summarizes  the  situation  for  another  important  class  of 
series. 


Example  2.7.5  (Geometric  Series).  A  series  is  called  geometric  if  it  is  of 
the  form 

oo 

ark  =  a  +  ar  +  ar 2  +  ar 3  +  •  •  •  . 

k=0 

If  r  =  1  and  a  /  0,  the  series  evidently  diverges.  For  r  /  1,  the  algebraic 
identity 

(1  —  r)(l  +  r  -T  r2  +  r3  +  •  •  •  +  rm_1)  =  1  —  rm 
enables  us  to  rewrite  the  partial  sum 


Sra  =  a  +  ar  +  ar 2  +  ar 3  +  •  •  •  +  arm  1 


a(  1  —  rm) 
1  —  r 


Now  the  Algebraic  Limit  Theorem  (for  sequences)  and  Example  2.5.3  justify 
the  conclusion 


oo 

ark 

k=0 


a 


1  —  r 


if  and  only  if  \r\  <  1 


Although  the  Comparison  Test  requires  that  the  terms  of  the  series  be  posi¬ 
tive,  it  is  often  used  in  conjunction  with  the  next  theorem  to  handle  series  that 
contain  some  negative  terms. 


Theorem  2.7.6  (Absolute  Convergence  Test).  If  the  series 
verges,  then  Li  an  converges  as  well. 


con- 
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Proof.  This  proof  makes  use  of  both  the  necessity  (the  “if”  direction)  and  the 
sufficiency  (the  “only  if”  direction)  of  the  Cauchy  Criterion  for  Series.  Because 
\an\  converges,  we  know  that,  given  an  e  >  0,  there  exists  an  TV  G  N  such 

that 


am+ 1  +  ttm+2  +  ‘  ‘  + 


a 


n 


<  e 


for  all  n  >  m  >  TV.  By  the  triangle  inequality, 


&m+ 1  4"  2  +  '  '  '  + 


E  &ra+ 1  +  &ra+ 2  +  •  •  '  + 


a 


n 


so  the  sufficiency  of  the  Cauchy  Criterion  guarantees  that  Y^=i  an  als° 
converges.  □ 


The  converse  of  this  theorem  is  false.  In  the  opening  discussion  of  this 
chapter,  we  considered  the  alternating  harmonic  series 

11111 
1  —  —  4-  —  —  —  —  —  —  H-  ’  *  *  • 

2  3  4  5  6 

Taking  absolute  values  of  the  terms  gives  us  the  harmonic  series  Vn5 

which  we  have  seen  diverges.  However,  it  is  not  too  difficult  to  prove  that  with 
the  alternating  negative  signs  the  series  indeed  converges.  This  is  a  special  case 
of  the  Alternating  Series  Test. 

Theorem  2.7.7  (Alternating  Series  Test).  Let  (an)  be  a  sequence  satisfying , 

(i)  a\  >  a2  >  as  >  •  •  •  >  an  >  an+ 1  >  •  •  •  and 

(ii)  (an)  — >  0. 

Then,  the  alternating  series  (— l)n+lan  converges. 

Proof.  A  consequence  of  conditions  (i)  and  (ii)  is  that  an  >  0.  Several  proofs  of 
this  theorem  are  outlined  in  Exercise  2.7.1.  □ 


Definition  2.7.8.  If  ^2 


OO  I 

n= 1 


converges,  then  we  say  that  the  original  series 


]T^°=i  an  converges  absolutely.  If,  on  the  other  hand,  the  series  an  con_ 

verges  but  the  series  of  absolute  values  \an\  does  not  converge,  then  we 

say  that  the  original  series  an  converges  conditionally. 

In  terms  of  this  newly  defined  jargon,  we  have  shown  that 


OO 


E 

n— 1 


(-i) 


n+1 


n 


converges  conditionally,  whereas 


OO 


E 

n— 1 


(-l)”+1  y,  1 
5  E—/  On 


n—1 


OO 


E 

n= 1 


(-1) 


n+1 


)n 


and 
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converge  absolutely.  In  particular,  any  convergent  series  with  (all  but  finitely 
many)  positive  terms  must  converge  absolutely. 

The  Alternating  Series  Test  is  the  most  accessible  test  for  conditional  con¬ 
vergence,  but  several  others  are  explored  in  the  exercises.  In  particular,  Abel’s 
Test,  outlined  in  Exercise  2.7.13,  will  prove  useful  in  our  investigations  of  power 
series  in  Chapter  6. 


Rearrangements 

Informally  speaking,  a  rearrangement  of  a  series  is  obtained  by  permuting  the 
terms  in  the  sum  into  some  other  order.  It  is  important  that  all  of  the  original 
terms  eventually  appear  in  the  new  ordering  and  that  no  term  gets  repeated. 
In  an  earlier  discussion  from  Section  2.1,  we  formed  a  rearrangement  of  the 
alternating  harmonic  series  by  taking  two  positive  terms  for  each  negative  term: 

11111 
lT  —  —  —  T  —  T  —  —  —  T  •  •  •  • 

3  2  5  7  4 

There  are  clearly  an  infinite  number  of  rearrangements  of  any  sum;  however,  it 
is  helpful  to  see  why  neither 

11111 
lT  —  —  —  T  —  T  —  —  —  T  •  •  • 

2  3  4  5  6 

nor 

1111111  1 
lT  —  —  —  T  —  T  —  —  —  T  —  T  —  —  —  T  *  *  ■ 

3  4  5  7  8  9  11  12 

is  considered  a  rearrangement  of  the  original  alternating  harmonic  series. 


Definition  2.7.9.  Let  ak  be  a  series.  A  series  ^2k=1  bk  is  called  a  rear¬ 
rangement  of  ak  if  there  exists  a  one-to-one,  onto  function  /  :  N  N 

such  that  5/(/c)  =  ak  for  all  k  E  N. 

We  now  have  all  the  tools  and  notation  in  place  to  resolve  an  issue  raised 
at  the  beginning  of  the  chapter.  In  Section  2.1,  we  constructed  a  particular 
rearrangement  of  the  alternating  harmonic  series  that  converges  to  a  limit  dif¬ 
ferent  from  that  of  the  original  series.  This  happens  because  the  convergence  is 
conditional. 

Theorem  2.7.10.  If  a  series  converges  absolutely,  then  any  rearrangement  of 
this  series  converges  to  the  same  limit. 

Proof.  Assume  YlkLi  ak  converges  absolutely  to  A,  and  let  Y^kLi  be  a  rear¬ 
rangement  of  '^kLi  ak’  Let’s  use 


n 


sn  —  ^  ^  ak  —  cl\  T  <3-2  T  •  •  •  T  a 


n 


k= 1 


for  the  partial  sums  of  the  original  series  and  use 


m 


t'rn  —  bk  —  bi  T  T  •  •  •  T  brn 


k= 1 
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for  the  partial  sums  of  the  rearranged  series.  Thus  we  want  to  show  that 

(tm)  A. 

Let  e  >  0.  By  hypothesis,  (sn)  A ,  so  choose  Afi  such  that 


’n 


for  all  n>  N\.  Because  the  convergence  is  absolute,  we  can  choose  N2  so  that 


E 


< 


e 

2 


for  all  n  >  m  >  N2.  Now,  take  N  =  max  {Ah,  N2}.  We  know  that  the  finite  set 
of  terms  {ai,a2,a3, . . .  ,  avr}  must  all  appear  in  the  rearranged  series,  and  we 
want  to  move  far  enough  out  in  the  series  so  that  we  have  included  all 

of  these  terms.  Thus,  choose 


M  =  max{/(fc)  :  1  <  k  <  N}, 


It  should  now  be  evident  that  if  m  >  M,  then  (tm  —  sn)  consists  of  a  finite 
set  of  terms,  the  absolute  values  of  which  appear  in  the  tail 
choice  of  N2  earlier  then  guarantees  \trn  —  sn\  <  e/2,  and  so 


00 

k=N+l 


^k 


Our 


< 

< 


tm  —  SN  +  SN  - 
tm  —  Sn  +  Sn 


A 

-A 


whenever  m  >  M .  □ 

Exercises 

Exercise  2.7.1.  Proving  the  Alternating  Series  Test  (Theorem  2.7.7)  amounts 
to  showing  that  the  sequence  of  partial  sums 

Sn  —  eq  U2  H-  & 3  T  Qjn 

converges.  (The  opening  example  in  Section  2.1  includes  a  typical  illustration 
of  (sn).)  Different  characterizations  of  completeness  lead  to  different  proofs. 

(a)  Prove  the  Alternating  Series  Test  by  showing  that  (sn)  is  a  Cauchy 
sequence. 

(b)  Supply  another  proof  for  this  result  using  the  Nested  Interval  Property 
(Theorem  1.4.1). 

(c)  Consider  the  subsequences  (s2n)  and  (s 2n+i),  and  show  how  the  Monotone 
Convergence  Theorem  leads  to  a  third  proof  for  the  Alternating  Series 
Test. 
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Exercise  2.7.2.  Decide  whether  each  of  the  following  series  converges  or 
diverges: 

(r,\  1  [U\  sin (n) 

2-^n— 1  2 n+n  v  /  2_-/n=l  n2 

(<A  1_3i4_5|_6__Xj - 

vW  4^6  8  “  10  12  ' 

fd)  lH-  —  —  f  +  f  +  f  —  I  4-  1  4-  I  —  4  —I—  •  •  • 

Vu/  -L  W  2  3  ^  4  ^  5  6  ^  7  ^  8  9  ^ 

1  1,1  1,1  1,1  1  , 

ej  I-22  +  3-  42+5-62+7-82H 

Exercise  2.7.3.  (a)  Provide  the  details  for  the  proof  of  the  Comparison  Test 

(Theorem  2.7.4)  using  the  Cauchy  Criterion  for  Series. 

(b)  Give  another  proof  for  the  Comparison  Test,  this  time  using  the  Monotone 
Convergence  Theorem. 

Exercise  2.7.4.  Give  an  example  of  each  or  explain  why  the  request  is  impos¬ 
sible  referencing  the  proper  theorem(s). 

(a)  Two  series  22  xn  and  22  yn  that  both  diverge  but  where  22  xnUn  converges. 

(b)  A  convergent  series  22  xn  and  a  bounded  sequence  (yn)  such  that  22xnVn 
diverges. 

(c)  Two  sequences  (xn)  and  (yn)  where  22 xn  and  22(xn  +2/n)  both  converge 
but  22  Vu  diverges. 

(d)  A  sequence  (xn)  satisfying  0  <  xn  <  1  jn  where  ^(— 1  )nxn  diverges. 

Exercise  2.7.5.  Now  that  we  have  proved  the  basic  facts  about  geometric 
series,  supply  a  proof  for  Corollary  2.4.7. 

Exercise  2.7.6.  Let’s  say  that  a  series  subverges  if  the  sequence  of  partial 
sums  contains  a  subsequence  that  converges.  Consider  this  (invented)  definition 
for  a  moment,  and  then  decide  which  of  the  following  statements  are  valid 
propositions  about  subvergent  series: 

(a)  If  (an)  is  bounded,  then  22  an  subverges. 

(b)  All  convergent  series  are  subvergent. 

(c)  if  EM  subverges,  then  22  an  subverges  as  well. 

(d)  If  22  an  subverges,  then  (an)  has  a  convergent  subsequence. 

Exercise  2.7.7.  (a)  Show  that  if  an  >  0  and  lim(nan)  =  l  with  l  yf  0,  then 

the  series  22  an  diverges. 

(b)  Assume  an  >  0  and  lim(n2an)  exists.  Show  that  22  an  converges. 

Exercise  2.7.8.  Consider  each  of  the  following  propositions.  Provide  short 
proofs  for  those  that  are  true  and  counterexamples  for  any  that  are  not. 
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(a)  If  IX  converges  absolutely,  then  E  also  converges  absolutely. 

(b)  If  E  an  converges  and  (bn)  converges,  then  ^2  anbn  converges. 

(c)  If  E  dn  converges  conditionally,  then  ^2n2an  diverges. 

Exercise  2.7.9  (Ratio  Test).  Given  a  series  an  with  an  ^  0,  the  Ratio 

Test  states  that  if  (an)  satisfies 


lim 


&n+l 


a 


n 


=  r  <  1, 


then  the  series  converges  absolutely. 

(a)  Let  r'  satisfy  r  <  rf  <  1.  Explain  why  there  exists  an  N  such  that  n  >  N 


implies  |an+i|  < 


a 


n 


r' . 


(b)  Why  does  |ajv|  J2(r')n  converge? 

(c)  Now,  show  that  J2  \an\  converges,  and  conclude  that  J2an  converges. 

Exercise  2.7.10  (Infinite  Products).  Review  Exercise  2.4.10  about  infinite 
products  and  then  answer  the  following  questions: 


(a)  Does  f'l'l'l'll'''  converge? 


2  4  8  16 


(b)  The  infinite  product  \  •  |  •  |  •  |  •  yh 
it  converge  to  zero? 


certainly  converges.  (Why?)  Does 


(c)  In  1655,  John  Wallis  famously  derived  the  formula 


/2-2\  /4  •  4\  /6  •  6\  /8  •  8\  _  tt 
\TCS  )  )  \5W  )  \Jr9  )' ’ '  ~  2 ‘ 

Show  that  the  left  side  of  this  identity  at  least  converges  to  something. 
(A  complete  proof  of  this  result  is  taken  up  in  Section  8.3.) 


Exercise  2.7.11.  Find  examples  of  two  series  and  ^2bn  both  of  which 

diverge  but  for  which  ^min{an,6n}  converges.  To  make  it  more  challenging, 
produce  examples  where  (an)  and  (bn)  are  strictly  positive  and  decreasing. 

Exercise  2.7.12  (Summation- by-parts).  Let  (xn)  and  (yn)  be  sequences,  let 
sn  =  x\  +  X2  +  •  •  •  +  xn  and  set  so  =  0.  Use  the  observation  that  Xj  =  Sj  —  Sj_i 
to  verify  the  formula 

n  n 

^  ^  Uj  SfiUn+l  Sm  —  lUm  T  ^  ^  (tJj  Uj- fl)- 

j—m  j=m 
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Exercise  2.7.13  (Abel’s  Test).  Abel’s  Test  for  convergence  states  that  if  the 
series  YlkLi  xk  converges,  and  if  (y^)  is  a  sequence  satisfying 

Vi  >  V2  >  V3  >  •  •  *  >  0, 
then  the  series  xkUk  converges. 

(a)  Use  Exercise  2.7.12  to  show  that 

n  n 

^  ^  %kVk  SnVn-r  1  ~b  ^  ^  $ k  if/k  Vk-\- 1)5 

k— 1  k—1 

where  sn  =  x\  +  +  •  •  •  +  xn. 

(b)  Use  the  Comparison  Test  to  argue  that  YlkLi  sk(l/k  ~  Vk+i)  converges 
absolutely,  and  show  how  this  leads  directly  to  a  proof  of  Abel’s  Test. 

Exercise  2.7.14  (Dirichlet’s  Test).  Dirichlet’s  Test  for  convergence  states 
that  if  the  partial  sums  of  xk  are  bounded  (but  not  necessarily  conver¬ 

gent),  and  if  (y^)  is  a  sequence  satisfying  y\  >  y2  >  yz  >  •  •  •  >  0  with  lim^  =  0, 
then  the  series  Ylk*=i  xkyk  converges. 

(a)  Point  out  how  the  hypothesis  of  Dirichlet’s  Test  differs  from  that  of  Abel’s 
Test  in  Exercise  2.7.13,  but  show  that  essentially  the  same  strategy  can 
be  used  to  provide  a  proof. 

(b)  Show  how  the  Alternating  Series  Test  (Theorem  2.7.7)  can  be  derived  as 
a  special  case  of  Dirichlet’s  Test. 

2.8  Double  Summations  and  Products 
of  Infinite  Series 

Given  a  doubly  indexed  array  of  real  numbers  {a^-  :  i,  j  E  N},  we  discovered 
in  Section  2.1  that  there  is  a  dangerous  ambiguity  in  how  we  might  define 
YlTj=iaij •  Performing  the  sum  over  first  one  of  the  variables  and  then  the 
other  is  referred  to  as  an  iterated  summation.  In  our  specific  example,  summing 
the  rows  first  and  then  taking  the  sum  of  these  totals  produced  a  different  result 
than  first  computing  the  sum  of  each  column  and  adding  these  sums  together. 
In  short, 

oo  oo  oo  oo 

aij  aij  • 

j  —  1  2=1  2  =  1  j—  1 

There  are  still  other  ways  to  reasonably  define  J2Tj= l  afr  One  natural  idea 
is  to  calculate  a  kind  of  partial  sum  by  adding  together  finite  numbers  of  terms 
in  larger  and  larger  “rectangles”  in  the  array;  that  is,  for  m,  n  E  N,  set 


2=1  j  =  1 
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The  order  of  the  sum  here  is  irrelevant  because  the  sum  is  finite.  Of  particular 
interest  to  our  discussion  are  the  sums  snn  (sums  over  “squares”),  which  form 
a  legitimate  sequence  indexed  by  n  and  thus  can  be  subjected  to  our  arsenal 
of  theorems  and  definitions.  If  the  sequence  (snn)  converges,  for  instance,  we 
might  wish  to  define 

oo 


hj= 1 


lim  snn. 

n— >oo 


Exercise  2.8.1.  Using  the  particular  array  (a^)  from  Section  2.1,  compute 
liiUn^oo  snn.  How  does  this  value  compare  to  the  two  iterated  values  for  the 
sum  already  computed? 


There  is  a  deep  similarity  between  the  issue  of  how  to  define  a  double  summa¬ 
tion  and  the  topic  of  rearrangements  discussed  at  the  end  of  Section  2.7.  Both 
relate  to  the  commutativity  of  addition  in  an  infinite  setting.  For  rearrange¬ 
ments,  the  resolution  came  with  the  added  hypothesis  of  absolute  convergence, 
and  it  is  not  surprising  that  the  same  remedy  applies  for  double  summations. 
Under  the  assumption  of  absolute  convergence,  each  of  the  methods  discussed 
for  computing  the  value  of  a  double  sum  yields  the  same  result. 

Exercise  2.8.2.  Show  that  if  the  iterated  series 


oo  oo 

EE 

1=1  j= 1 


\aij 


converges  (meaning  that  for  each  fixed  iG  N  the  series  Yj= i  \ad\  converges  to 
some  real  number  and  the  series  converges  as  well),  then  the  iterated 

series 


oo  oo 


a 


i=l  j= 1 

converges. 

Theorem  2.8.1.  Let  {a^  :  i,  j  E  N}  be  a  doubly  indexed  array  of  real  numbers. 

If 


OO  OO 


\aij 


i=l  j= 1 

converges,  then  both  Y^jL i  aij  an d  Y^jLi  Yr^i  aij  converge  to  the  same 

value.  Moreover , 


oo  oo 


oo  oo 


lim  snn  — 


n— oo 


CLij 


i—lj—1  j=l  i=l 


O'ij , 


7  71  \ — \n 

where  snn  /  j\—\  /  ^j—i  * 

Proof.  In  the  same  way  that  we  defined  the  rectangular  partial  sums  smn  above 
in  equation  (1),  define 


m  n 


tmn 


a 


i= 1  3  = 1 
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Exercise  2.8.3.  (a)  Prove  that  (tnn)  converges. 

(b)  Now,  use  the  fact  that  (tnn)  is  a  Cauchy  sequence  to  argue  that  (snn) 
converges. 


We  can  now  set 

S  =  lim  snn. 

n— too 

In  order  to  prove  the  theorem,  we  must  show  that  the  two  iterated  sums  converge 
to  this  same  limit.  We  will  first  show  that 


5  = 


oo  oo 

EE 

1=1  j= 1 


a 


* 3 


Because  {tmn  :  m,n  G  N}  is  bounded  above,  we  can  let 

B  =  sup{tmn  :  m,  n  G  N}. 


Exercise  2.8.4.  (a)  Let  e  >  0  be  arbitrary  and  argue  that  there  exists  an 

Ni  £  N  such  that  m,  n  >  Ni  implies  B  —  |  <  tmn  <  B. 

(b)  Now,  show  that  there  exists  an  N  such  that 


}mn 


51  < 


for  all  m,  n  >  N. 


For  the  moment,  consider  m  E  N  to  be  fixed  and  write  smn  as 


Smn 


n  n  n 

E  °«  +  E  CL2j  H - h  E  CLmj  • 

j= 1  1=1  J  =  1 


Our  hypothesis  guarantees  that  for  each  fixed  row  i,  the  series  Xlyli  con_ 
verges  absolutely  to  some  real  number  ?y. 

Exercise  2.8.5.  (a)  Show  that  for  all  m  >  N 


|(ri  +r2  H - b  rm)  -  5|  <  e. 

Conclude  that  the  iterated  sum  Xljli  aij  converges  to  S. 

(b)  Finish  the  proof  by  showing  that  the  other  iterated  sum,  Y^jLi  aiji 

converges  to  S  as  well.  Notice  that  the  same  argument  can  be  used  once 
it  is  established  that,  for  each  fixed  column  j,  the  sum  aij  converges 

to  some  real  number  Cj.  □ 
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One  final  common  way  of  computing  a  double  summation  is  to  sum  along 
diagonals  where  i  +  j  equals  a  constant.  Given  a  doubly  indexed  array  {aij  : 
hJ  €  N},  let 


d2  —  an,  d%  —  ai2  +  U2i, 


^4  —  CLis  +  CL22  +  «31, 


and  in  general  set 


dk  —  fll.fc— 1  +  &2,k—2  +  *  *  *  +  flfe— 1,1- 

Then,  22^=2  dk  represents  another  reasonable  way  of  summing  over  every  in 
the  array. 

Exercise  2.8.6.  (a)  Assuming  the  hypothesis — and  hence  the  conclusion — of 

Theorem  2.8.1,  show  that  22k^2  dk  converges  absolutely. 

(b)  Imitate  the  strategy  in  the  proof  of  Theorem  2.8.1  to  show  that  22^=2  dk 
converges  to  S  =  linp^oo  snn. 


Products  of  Series 

Conspicuously  missing  from  the  Algebraic  Limit  Theorem  for  Series  (Theo¬ 
rem  2.7.1)  is  any  statement  about  the  product  of  two  convergent  series.  One 
way  to  formally  carry  out  the  algebra  on  such  a  product  is  to  write 


< 


oo 


2=1 


{&!  +  CL2  +  T  ■  '  '  )(&1  +  +  63  T  ■  '  '  ) 


ai&l  T  (fll&2  T  tt2^l)  T  (^361  +  (22^2  +  Hi 63)  +  •  •  • 

00 

k= 2 


where 


dk  —  O'ibk—i  +  1126^-2  +  •  •  •  +  Hfe_i6i. 


This  particular  form  of  the  product,  examined  earlier  in  Exercise  2.8.6,  is  called 
the  Cauchy  product  of  two  series.  Although  there  is  something  algebraically 
natural  about  writing  the  product  in  this  form,  it  may  very  well  be  that  com¬ 
puting  the  value  of  the  sum  is  more  easily  done  via  one  or  the  other  iterated 
summation.  The  question  remains,  then,  as  to  how  the  value  of  the  Cauchy 
product — if  it  exists — is  related  to  these  other  values  of  the  double  sum.  If  the 
two  series  being  multiplied  converge  absolutely,  it  is  not  too  difficult  to  prove 
that  the  sum  may  be  computed  in  whatever  way  is  most  convenient. 


Exercise  2.8.7.  Assume  that  22iLi  ai  converges  absolutely  to  A,  and  22jZ  1  bj 
converges  absolutely  to  B. 


2.9.  Epilogue 
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(a)  Show  that  the  iterated  sum  22^i  YlTLi  \a^j\  converges  so  that  we  may 


apply  Theorem  2.8.1. 


(b)  Let  snn  =  22h=\  YH=i  aibj,  and  prove  that  lim^^nn  snn  =  AB.  Conclude 


that 


<i= 1  1 

oo  oo 


oo  oo 


oo 


bj  — 


dibj  =  dk  =  AB. 

i=l  j=l  j=l  2=1  k—2 

where,  as  before,  dk  =  aibk-i  +  CL2bk-2  +  •  •  •  +  ak-\b\. 


2.9  Epilogue 


Theorems  2.7.10  and  2.8.1  make  it  clear  that  absolute  convergence  is  an 
extremely  desirable  quality  to  have  when  manipulating  series.  On  the  other 
hand,  the  situation  for  conditionally  convergent  series  is  delightfully  patholog¬ 
ical.  In  the  case  of  rearrangements,  not  only  are  they  no  longer  guaranteed  to 
converge  to  the  same  limit,  but  in  fact  if  22^Li  an  converges  conditionally,  then 
for  any  r  E  R  there  exists  a  rearrangement  of  222^=1  an  that  converges  to  r.  To 
see  why,  let’s  look  again  at  the  alternating  harmonic  series 


The  negative  terms  taken  alone  form  the  series  2222=1 (_ l)/2n.  The  partial 
sums  of  this  series  are  precisely  —1/2  the  partial  sums  of  the  harmonic  series, 
and  so  march  off  (at  half  speed)  to  negative  infinity.  A  similar  argument  shows 
that  the  sum  of  positive  terms  22^2=i  l/(2n  —  1)  also  diverges  to  infinity.  It  is 
not  too  difficult  to  argue  that  this  situation  is  always  the  case  for  conditionally 
convergent  series.  Now,  let  r  be  some  proposed  limit,  which,  for  the  sake  of 
this  argument,  we  take  to  be  positive.  The  idea  is  to  take  as  many  positive 
terms  as  necessary  to  form  the  first  partial  sum  greater  than  r.  We  then  add 
negative  terms  until  the  partial  sum  falls  below  r,  at  which  point  we  switch  back 
to  positive  terms.  The  fact  that  there  is  no  bound  on  the  sums  of  either  the 
positive  terms  or  the  negative  terms  allows  this  process  to  continue  indefinitely. 
The  fact  that  the  terms  themselves  tend  to  zero  is  enough  to  guarantee  that  the 
partial  sums,  when  constructed  in  this  manner,  indeed  converge  to  r  as  they 
oscillate  around  this  target  value. 

Perhaps  the  best  way  to  summarize  the  situation  is  to  say  that  the  hypothe¬ 
sis  of  absolute  convergence  essentially  allows  us  to  treat  infinite  sums  as  though 
they  were  finite  sums.  This  assessment  extends  to  double  sums  as  well,  although 
there  are  a  few  subtleties  to  address.  In  the  case  of  products,  we  showed  in  Ex¬ 
ercise  2.8.7  that  the  Cauchy  product  of  two  absolutely  convergent  infinite  series 
converges  to  the  product  of  the  two  factors,  but  in  fact  the  same  conclusion 
follows  if  we  only  have  absolute  convergence  in  one  of  the  two  original  series.  In 
the  notation  of  Exercise  2.8.7,  if  22  an  converges  absolutely  to  A,  and  if  22  bn 
converges  (perhaps  conditionally)  to  B ,  then  the  Cauchy  product  22  dk  =  AB. 
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On  the  other  hand,  if  both  an  and  22  bn  converge  conditionally,  then  it  is 
possible  for  the  Cauchy  product  to  diverge.  Squaring  22(~^-)n  /  provides  an 
example  of  this  phenomenon.  Of  course,  it  is  also  possible  to  find  22  an  =  A  con¬ 
ditionally  and  22  bn  =  B  conditionally  whose  Cauchy  product  22  dk  converges. 
If  this  is  the  case,  then  the  convergence  is  to  the  right  value,  namely  22  dk  =  AB. 
A  proof  of  this  last  fact  will  be  offered  in  Chapter  6  (Exercise  6.5.9),  where  we 
undertake  the  study  of  power  series.  Here  is  the  connection.  A  power  series 
has  the  form  ao  +  a\x  +  ct2X 2  +  •  •  • .  If  we  multiply  two  power  series  together  as 
though  they  were  polynomials,  then  when  we  collect  common  powers  of  x  the 
result  is 


(no  4~  cl\x  4~  CL2X ^  +  •  •  •  )(&o  4“  b\x  -f-  62^  4-  •  •  • ) 

=  no^o  4~  (aobi  4~  ctib^x  +  (no^2  4~  ci\b\  +  U2^o)^2  4~  •  •  • 

c\ 

=  do  4~  d\x  -j-  (I2X  +  •  •  •  , 

which  is  the  Cauchy  product  of  22anXn  and  (The  index  starts  with 

n  =  0  rather  than  n  —  1.)  Upcoming  results  about  the  good  behavior  of  power 
series  will  lead  to  a  proof  that  convergent  Cauchy  products  sum  to  the  proper 
value.  In  the  other  direction,  Exercise  2.8.7  will  be  useful  in  establishing  a 
theorem  about  the  product  of  two  power  series. 


Chapter  3 

Basic  Topology  of  R 


3.1  Discussion:  The  Cantor  Set 


What  follows  is  a  fascinating  mathematical  construction,  due  to  Georg  Cantor, 
which  is  extremely  useful  for  extending  the  horizons  of  our  intuition  about  the 
nature  of  subsets  of  the  real  line.  Cantor’s  name  has  already  appeared  in  the 
first  chapter  in  our  discussion  of  uncountable  sets.  Indeed,  Cantor’s  proof  that 
R  is  uncountable  occupies  another  spot  on  the  short  list  of  the  most  significant 
contributions  toward  understanding  the  mathematical  infinite.  In  the  words  of 
the  mathematician  David  Hilbert,  “No  one  shall  expel  us  from  the  paradise  that 
Cantor  has  created  for  us.” 

Let  Co  be  the  closed  interval  [0, 1],  and  define  C\  to  be  the  set  that  results 
when  the  open  middle  third  is  removed;  that  is, 


c‘ = G- §) 


1 

3 


Now,  construct  C2  in  a  similar  way  by  removing  the  open  middle  third  of  each 
of  the  two  components  of  C\ : 


2  7' 
3’ 9 


U 


If  we  continue  this  process  inductively,  then  for  each  n  =  0,l,2,...we  get  a  set 
Cn  consisting  of  2n  closed  intervals  each  having  length  l/3n.  Finally,  we  define 
the  Cantor  set  C  (Fig.  3.1)  to  be  the  intersection 


c=  p|  cn 

n= 0 
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Figure  3.1: 


Defining  the  Cantor  set;  C  =  |T=0  Cn- 


It  may  be  useful  to  understand  C  as  the  remainder  of  the  interval  [0, 1]  after 
the  iterative  process  of  removing  open  middle  thirds  is  taken  to  infinity: 


C=[0,1]\ 


1  2\  (l  2\  (7  8 

3’ 3  )  U  (  9’ 9  )  U  (  9’ 9 


U 


There  is  some  initial  doubt  whether  anything  remains  at  all,  but  notice  that 
because  we  are  always  removing  open  middle  thirds,  then  for  every  n  E  N, 
0  G  Cn  and  hence  0  G  C.  The  same  argument  shows  1  E  C.  In  fact,  if  y  is  the 
endpoint  of  some  closed  interval  of  some  particular  set  Cni  then  it  is  also  an 
endpoint  of  one  of  the  intervals  of  Cn+ Because,  at  each  stage,  endpoints  are 
never  removed,  it  follows  that  y  E  Cn  for  all  n.  Thus,  C  at  least  contains  the 
endpoints  of  all  of  the  intervals  that  make  up  each  of  the  sets  Cn. 

Is  there  anything  else?  Is  C  countable?  Does  C  contain  any  intervals?  Any 
irrational  numbers?  These  are  difficult  questions  at  the  moment.  All  of  the 
endpoints  mentioned  earlier  are  rational  numbers  (they  have  the  form  mj 3n), 
which  means  that  if  it  is  true  that  C  consists  of  only  these  endpoints,  then  C 
would  be  a  subset  of  Q  and  hence  countable.  We  shall  see  about  this.  There  is 
some  strong  evidence  that  not  much  is  left  in  C  if  we  consider  the  total  length  of 
the  intervals  removed.  To  form  Ci,  an  open  interval  of  length  1/3  was  taken  out. 
In  the  second  step,  we  removed  two  intervals  of  length  1/9,  and  to  construct 
Cn  we  removed  2n~l  middle  thirds  of  length  l/3n.  There  is  some  logic,  then, 
to  defining  the  “length”  of  C  to  be  1  minus  the  total 


The  Cantor  set  has  zero  length. 

To  this  point,  the  information  we  have  collected  suggests  a  mental  picture 
of  C  as  a  relatively  small,  thin  set.  For  these  reasons,  the  set  C  is  often  referred 
to  as  Cantor  “dust.”  But  there  are  some  strong  counterarguments  that  imply 
a  very  different  picture.  First,  C  is  actually  uncountable ,  with  cardinality  equal 
to  the  cardinality  of  R.  One  slightly  intuitive  but  convincing  way  to  see  this  is 
to  create  a  1-1  correspondence  between  C  and  sequences  of  the  form  (an)“=1, 
where  an  =  0  or  1.  For  each  c  E  C,  set  a±  =  0  if  c  falls  in  the  left-hand  component 
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Figure  3.2:  Magnifying  sets  by  a  factor  of  3. 


of  C i  and  set  d\  =  1  if  c  falls  in  the  right-hand  component.  Having  established 
where  in  C\  the  point  c  is  located,  there  are  now  two  possible  components  of 
C2  that  might  contain  c.  This  time,  we  set  <22  =  0  or  1  depending  on  whether  c 
falls  in  the  left  or  right  half  of  these  two  components  of  C2.  Continuing  in  this 
way,  we  come  to  see  that  every  element  c  E  C  yields  a  sequence  (<21,  <22,  <23, . . .) 
of  zeros  and  ones  that  acts  as  a  set  of  directions  for  how  to  locate  c  within  C. 
Likewise,  every  such  sequence  corresponds  to  a  point  in  the  Cantor  set.  Because 
the  set  of  sequences  of  zeros  and  ones  is  uncountable  (Exercise  1.6.4),  we  must 
conclude  that  C  is  uncountable  as  well. 

What  does  this  imply?  In  the  first  place,  because  the  endpoints  of  the 
approximating  sets  Cn  form  a  countable  set,  we  are  forced  to  accept  the  fact 
that  not  only  are  there  other  points  in  C  but  there  are  uncountably  many  of 
them.  From  the  point  of  view  of  cardinality ,  C  is  quite  large — as  large  as  R, 
in  fact.  This  should  be  contrasted  with  the  fact  that  from  the  point  of  view  of 
length ,  C  measures  the  same  size  as  a  single  point.  We  conclude  this  discussion 
with  a  demonstration  that  from  the  point  of  view  of  dimension ,  C  strangely 
falls  somewhere  in  between. 

There  is  a  sensible  agreement  that  a  point  has  dimension  zero,  a  line  segment 
has  dimension  one,  a  square  has  dimension  two,  and  a  cube  has  dimension  three. 
Without  attempting  a  formal  definition  of  dimension  (of  which  there  are  several), 
we  can  nevertheless  get  a  sense  of  how  one  might  be  defined  by  observing  how 
the  dimension  affects  the  result  of  magnifying  each  particular  set  by  a  factor 
of  3  (Fig.  3.2).  (The  reason  for  the  choice  of  3  will  become  clear  when  we  turn 
our  attention  back  to  the  Cantor  set).  A  single  point  undergoes  no  change 
at  all,  whereas  a  line  segment  triples  in  length.  For  the  square,  magnifying 
each  length  by  a  factor  of  3  results  in  a  larger  square  that  contains  9  copies 
of  the  original  square.  Finally,  the  magnified  cube  yields  a  cube  that  contains 
27  copies  of  the  original  cube  within  its  volume.  Notice  that,  in  each  case,  to 
compute  the  “size”  of  the  new  set,  the  dimension  appears  as  the  exponent  of 
the  magnification  factor. 
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Figure  3.3:  Dimension  of  C;  2  =  3X  =>  x  =  log  2/ log  3. 


Now,  apply  this  transformation  to  the  Cantor  set.  The  set  Co  =  [0, 1] 
becomes  the  interval  [0,  3].  Deleting  the  middle  third  leaves  [0, 1]  U  [2,  3],  which 
is  where  we  started  in  the  original  construction  except  that  we  now  stand  to 
produce  an  additional  copy  of  C  in  the  interval  [2,  3].  Magnifying  the  Cantor  set 
by  a  factor  of  3  yields  two  copies  of  the  original  set.  Thus,  if  x  is  the  dimension 
of  C,  then  x  should  satisfy  2  =  3®,  or  x  =  log  2/ log  3  ~  .631  (Fig.  3.3). 

The  notion  of  a  noninteger  or  fractional  dimension  is  the  impetus  behind 
the  term  “fractal,”  coined  in  1975  by  Benoit  Mandlebrot  to  describe  a  class 
of  sets  whose  intricate  structures  have  much  in  common  with  the  Cantor  set. 
Cantor’s  construction,  however,  is  over  a  hundred  years  old  and  for  us  represents 
an  invaluable  testing  ground  for  the  upcoming  theorems  and  conjectures  about 
the  often  elusive  nature  of  subsets  of  the  real  line. 


3.2  Open  and  Closed  Sets 


Given  a  G  R,  and  e  >  o,  recall  that  the  e-neighborhood  of  a  is  the  set 


Ve(a)  =  {x  G  R  : 


x  —  a 


<  e}. 


In  other  words,  Ve(a)  is  the  open  interval  (a  —  e,  a  +  e),  centered  at  a  with 
radius  e. 


Definition  3.2.1.  A  set  O  C  R  is  open  if  for  all  points  a  G  O  there  exists  an 
e-neighborhood  Ve(a)  C  O. 

Example  3.2.2.  (i)  Perhaps  the  simplest  example  of  an  open  set  is  R  itself. 

Given  an  arbitrary  element  a  G  R,  we  are  free  to  pick  any  e-neighborhood 
we  like  and  it  will  always  be  true  that  Ve(a)  C  R.  It  is  also  the  case  that 
the  logical  structure  of  Definition  3.2.1  requires  us  to  classify  the  empty 
set  0  as  an  open  subset  of  the  real  line. 

(ii)  For  a  more  useful  collection  of  examples,  consider  the  open  interval 


(c,  d)  =  {x  G  R  :  c  <  x  <  d}. 
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To  see  that  (c,  d)  is  open  in  the  sense  just  defined,  let  x  G  (c,  d)  be  arbi¬ 
trary.  If  we  take  e  =  min{x  —  c,  d  —  #},  then  it  follows  that  V^(x)  C  (c,  d). 
It  is  important  to  see  where  this  argument  breaks  down  if  the  interval 
includes  either  one  of  its  endpoints. 

The  union  of  open  intervals  is  another  example  of  an  open  set.  This  obser¬ 
vation  leads  to  the  next  result. 

Theorem  3.2.3.  (i)  The  union  of  an  arbitrary  collection  of  open  sets  is  open. 

(ii)  The  intersection  of  a  finite  collection  of  open  sets  is  open. 

Proof.  To  prove  (i),  we  let  {Oa  :  A  G  A}  be  a  collection  of  open  sets  and  let 
O  =  Uaca  Let  a  an  arbitrary  element  of  O.  In  order  to  show  that  O  is 
open,  Definition  3.2.1  insists  that  we  produce  an  e-neighborhood  of  a  completely 
contained  in  O.  But  a  G  O  implies  that  a  is  an  element  of  at  least  one  particular 
0\r.  Because  we  are  assuming  0\>  is  open,  we  can  use  Definition  3.2.1  to  assert 
that  there  exists  Ve(a)  C  0\> .  The  fact  that  0\>  C  O  allows  us  to  conclude  that 
Ve(a)  C  O.  This  completes  the  proof  of  (i). 

For  (ii),  let  {Oi,  O2,  •  •  • ,  On}  be  a  finite  collection  of  open  sets.  Now,  if 
aerik=i°^  then  a  is  an  element  of  each  of  the  open  sets.  By  the  definition  of 
an  open  set,  we  know  that,  for  each  1  <  k  <  TV,  there  exists  V€k  (a)  C  Ok .  We 
are  in  search  of  a  single  e- neighborhood  of  a  that  is  contained  in  every  0&,  so 
the  trick  is  to  take  the  smallest  one.  Letting  e  =  minjei,  62, . . . ,  ejv},  it  follows 
that  Ve(a)  C  V6k(a)  for  all  k ,  and  hence  Ve(a)  C  f]^=1  Ok,  as  desired.  □ 


Closed  Sets 


Definition  3.2.4.  A  point  x  is  a  limit  point  of  a  set  A  if  every  e-neighborhood 
Ve(x)  of  x  intersects  the  set  A  at  some  point  other  than  x. 

Limit  points  are  also  often  referred  to  as  “cluster  points”  or  “accumulation 
points,”  but  the  phrase  ux  is  a  limit  point  of  A”  has  the  advantage  of  explicitly 
reminding  us  that  x  is  quite  literally  the  limit  of  a  sequence  in  A. 


Theorem  3.2.5.  A  point  x  is  a  limit  point  of  a  set  A  if  and  only  if  x  —  liman 
for  some  sequence  ( an )  contained  in  A  satisfying  an  7^  x  for  all  n  G  N. 


Proof.  (=>)  Assume  x  is  a  limit  point  of  A.  In  order  to  produce  a  sequence 
(an)  converging  to  ay  we  are  going  to  consider  the  particular  e-neighborhoods 
obtained  using  e  =  1/n.  By  Definition  3.2.4,  every  neighborhood  of  x  intersects 
A  in  some  point  other  than  x.  This  means  that,  for  each  n  G  N,  we  are  justified 
in  picking  a  point 

an  E  Vi/n(x)  FI  A 


with  the  stipulation  that  an  7^  x.  It  should  not  be  too  difficult  to  see  why 
(an)  -A  x.  Given  an  arbitrary  e  >  0,  choose  N  such  that  1/N  <  e.  It  follows 


that 


an  —  x\ 


<  e  for  all  n  >  N. 
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(<^=)  For  the  reverse  implication  we  assume  lim  an  =  x  where  an  £  A  but  an  ^ 
x ,  and  let  Ve(x)  be  an  arbitrary  e-neighborhood.  The  definition  of  convergence 
assures  us  that  there  exists  a  term  a n  in  the  sequence  satisfying  a at  £  Ve(x), 
and  the  proof  is  complete.  □ 

The  restriction  that  an  ^  x  in  Theorem  3.2.5  deserves  a  comment.  Given 
a  point  a  £  A,  it  is  always  the  case  that  a  is  the  limit  of  a  sequence  in  A  if 
we  are  allowed  to  consider  the  constant  sequence  (a,  a,  a, ...).  There  will  be 
occasions  where  we  will  want  to  avoid  this  somewhat  uninteresting  situation,  so 
it  is  important  to  have  a  vocabulary  that  can  distinguish  limit  points  of  a  set 
from  isolated  points. 

Definition  3.2.6.  A  point  a  £  A  is  an  isolated  point  of  A  if  it  is  not  a  limit 
point  of  A. 

As  a  word  of  caution,  we  need  to  be  a  little  careful  about  how  we  understand 
the  relationship  between  these  concepts.  Whereas  an  isolated  point  is  always 
an  element  of  the  relevant  set  A ,  it  is  quite  possible  for  a  limit  point  of  A  not 
to  belong  to  A.  As  an  example,  consider  the  endpoint  of  an  open  interval.  This 
situation  is  the  subject  of  the  next  important  definition. 

Definition  3.2.7.  A  set  F  C  R  is  closed  if  it  contains  its  limit  points. 

The  adjective  “closed”  appears  in  several  other  mathematical  contexts  and 
is  usually  employed  to  mean  that  an  operation  on  the  elements  of  a  given  set 
does  not  take  us  out  of  the  set.  In  linear  algebra,  for  example,  a  vector  space 
is  a  set  that  is  “closed”  under  addition  and  scalar  multiplication.  In  analysis, 
the  operation  we  are  concerned  with  is  the  limiting  operation.  Topologically 
speaking,  a  closed  set  is  one  where  convergent  sequences  within  the  set  have 
limits  that  are  also  in  the  set. 

Theorem  3.2.8.  A  set  F  C  R  is  closed  if  and  only  if  every  Cauchy  sequence 
contained  in  F  has  a  limit  that  is  also  an  element  of  F . 

Proof.  Exercise  3.2.5.  □ 

Example  3.2.9.  (i)  Consider 


A  =  |  —  :  n  £  N  | . 

Let’s  show  that  each  point  of  A  is  isolated.  Given  1/n  £  A ,  choose 
e  =  1/n  —  l/(n  +  1).  Then, 

ve(l/n)  D  A  =  ( -}  . 


It  follows  from  Definition  3.2.4  that  1/n  is  not  a  limit  point  and  so  is 
isolated.  Although  all  of  the  points  of  A  are  isolated,  the  set  does  have 
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one  limit  point,  namely  0.  This  is  because  every  neighborhood  centered 
at  zero,  no  matter  how  small,  is  going  to  contain  points  of  A.  Because 
0  ^  A,  A  is  not  closed.  The  set  F  =  A  U  {0}  is  an  example  of  a  closed 
set  and  is  called  the  closure  of  A.  (The  closure  of  a  set  is  discussed  in  a 
moment.) 

(ii)  Let’s  prove  that  a  closed  interval 

c,  d]  =  {x  G  R  :  c  <  x  <  d} 


is  a  closed  set  using  Definition  3.2.7.  If  x  is  a  limit  point  of  [c,  d],  then  by 
Theorem  3.2.5  there  exists  (xn)  C  [c,  d\  with  (xn)  x.  We  need  to  prove 
that  x  G  [c,  d\ . 

The  key  to  this  argument  is  contained  in  the  Order  Limit  Theorem 
(Theorem  2.3.4),  which  summarizes  the  relationship  between  inequalities 
and  the  limiting  process.  Because  c  <  xn  <  d,  it  follows  from  Theorem 
2.3.4  (iii)  that  c  <  x  <  d  as  well.  Thus,  [c,  d]  is  closed. 


(iii)  Consider  the  set  Q  C  R  of  rational  numbers.  An  extremely  important 
property  of  Q  is  that  its  set  of  limit  points  is  actually  all  of  R.  To  see 
why  this  is  so,  recall  Theorem  1.4.3  from  Chapter  1,  which  is  referred  to 
as  the  density  property  of  Q  in  R. 

Let  y  G  R  be  arbitrary,  and  consider  any  neighborhood  Ve(y)  =  (y  —  e, 
y  +  e).  Theorem  1.4.3  allows  us  to  conclude  that  there  exists  a  rational 
number  r  ^  y  that  falls  in  this  neighborhood.  Thus,  y  is  a  limit  point 
of  Q. 


The  density  property  of  Q  can  now  be  reformulated  in  the  following  way. 


Theorem  3.2.10  (Density  of  Q  in  R).  For  every  y  G  R,  there  exists  a 
sequence  of  rational  numbers  that  converges  to  y. 


Proof.  Combine  the  preceding  discussion  with  Theorem  3.2.5. 


□ 


The  same  argument  can  also  be  used  to  show  that  every  real  number  is  the 
limit  of  a  sequence  of  irrational  numbers.  Although  interesting,  part  of  the 
allure  of  the  rational  numbers  is  that,  in  addition  to  being  dense  in  R,  they  are 
countable.  As  we  will  see,  this  tangible  aspect  of  Q  makes  it  an  extremely  useful 
set,  both  for  proving  theorems  and  for  producing  interesting  counterexamples. 


Closure 

Definition  3.2.11.  Given  a  set  A  C  R,  let  L  be  the  set  of  all  limit  points  of 
A.  The  closure  of  A  is  defined  to  be  A  =  A  U  L. 

In  Example  3.2.9  (i),  we  saw  that  if  4  =  {1/n  :  n  G  N},  then  the  closure 
of  A  is  A  =  A  U  {0}.  Example  3.2.9  (iii)  verifies  that  Q  =  R.  If  A  is  an  open 
interval  (a,  6),  then  A  =  [a,  b\.  If  A  is  a  closed  interval,  then  A  =  A.  It  is  not 
for  lack  of  imagination  that  in  each  of  these  examples  A  is  always  a  closed  set. 
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Theorem  3.2.12.  For  any  A  C  R,  the  closure  A  is  a  closed  set  and  is  the 
smallest  closed  set  containing  A. 

Proof.  If  L  is  the  set  of  limit  points  of  A,  then  it  is  immediately  clear  that  A 
contains  the  limit  points  of  A.  There  is  still  something  more  to  prove,  however, 
because  taking  the  union  of  L  with  A  could  potentially  produce  some  new  limit 
points  of  A.  In  Exercise  3.2.7,  we  outline  the  argument  that  this  does  not 
happen. 

Now,  any  closed  set  containing  A  must  contain  L  as  well.  This  shows  that 
A  =  A  U  L  is  the  smallest  closed  set  containing  A.  □ 

Complements 

The  mathematical  notions  of  open  and  closed  are  not  antonyms  the  way  they  are 
in  standard  English.  If  a  set  is  not  open,  that  does  not  imply  it  must  be  closed. 
Many  sets  such  as  the  half-open  interval  (c,  d]  =  {x  £  R:  c  <  x  <  d}  are  neither 
open  nor  closed.  The  sets  R  and  0  are  both  simultaneously  open  and  closed 
although,  thankfully,  these  are  the  only  ones  with  this  disorienting  property 
(Exercise  3.2.13).  There  is,  however,  an  important  relationship  between  open 
and  closed  sets.  Recall  that  the  complement  of  a  set  A  C  R  is  defined  to  be 
the  set 

A  =  { x  £  R  i  x 

Theorem  3.2.13.  A  set  O  is  open  if  and  only  if  Oc  is  closed.  Likewise ,  a  set 
F  is  closed  if  and  only  if  Fc  is  open. 

Proof.  Given  an  open  set  OCR,  let’s  first  prove  that  Oc  is  a  closed  set.  To 
prove  Oc  is  closed,  we  need  to  show  that  it  contains  all  of  its  limit  points.  If 
x  is  a  limit  point  of  Oc,  then  every  neighborhood  of  x  contains  some  point  of 
Oc.  But  that  is  enough  to  conclude  that  x  cannot  be  in  the  open  set  O  because 
x  £  O  would  imply  that  there  exists  a  neighborhood  Ve(x)  C  O.  Thus,  x  £  Oc, 
as  desired. 

For  the  converse  statement,  we  assume  Oc  is  closed  and  argue  that  O  is  open. 
Thus,  given  an  arbitrary  point  x  £  O,  we  must  produce  an  e-neighborhood 
V€(x)  C  O.  Because  Oc  is  closed,  we  can  be  sure  that  x  is  not  a  limit  point  of 
Oc.  Looking  at  the  definition  of  limit  point,  we  see  that  this  implies  that  there 
must  be  some  neighborhood  Ve(x)  of  x  that  does  not  intersect  the  set  Oc.  But 
this  means  Ve(x)  C  O,  which  is  precisely  what  we  needed  to  show. 

The  second  statement  in  Theorem  3.2.13  follows  quickly  from  the  first  using 
the  observation  that  ( Ec)c  =  E  for  any  set  E  C  R.  □ 

The  last  theorem  of  this  section  should  be  compared  to  Theorem  3.2.3. 
Theorem  3.2.14.  (i)  The  union  of  a  finite  collection  of  closed  sets  is  closed. 

(ii)  The  intersection  of  an  arbitrary  collection  of  closed  sets  is  closed. 
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Proof.  De  Morgan’s  Laws  state  that  for  any  collection  of  sets  {E\  :  A  G  A}  it  is 
true  that 


\JEX)  =  p|  El  and  ( f|  =  (J  E°x. 
aga  J  aga  Vaga  J  aga 

The  result  follows  directly  from  these  statements  and  Theorem  3.2.3.  The 
details  are  requested  in  Exercise  3.2.9.  □ 

Exercises 

Exercise  3.2.1.  (a)  Where  in  the  proof  of  Theorem  3.2.3  part  (ii)  does  the 

assumption  that  the  collection  of  open  sets  be  finite  get  used? 

(b)  Give  an  example  of  a  countable  collection  of  open  sets  {Oi,  02,03, . . .} 
whose  intersection  fXi=i  On  is  closed,  not  empty  and  not  all  of  R. 

Exercise  3.2.2.  Let 

A=  j(-l)”  +  ^  :n  =  1,2,3,. ..j 

Answer  the  following  questions  for  each 

(a)  What  are  the  limit  points? 

(b)  Is  the  set  open?  Closed? 

(c)  Does  the  set  contain  any  isolated  points? 

(d)  Find  the  closure  of  the  set. 

Exercise  3.2.3.  Decide  whether  the  following  sets  are  open,  closed,  or  neither. 
If  a  set  is  not  open,  find  a  point  in  the  set  for  which  there  is  no  e-neighborhood 
contained  in  the  set.  If  a  set  is  not  closed,  find  a  limit  point  that  is  not  contained 
in  the  set. 


(a) 

Q 

(b) 

N. 

(c) 

{x 

g 

R  : 

x  7^  0}. 

(d) 

{i 

+ 

1/4 

+  1/9  +  •  • 

•  •  -j-  1/n2 

:  n  G  N}. 

(e) 

{i 

+ 

1/2 

+  1/3  +  -- 

■  •  +  1/n  : 

n  G  N}. 

and  B  =  {x  £  Q  :  0  <  x  <  1}  . 
set: 
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Exercise  3.2.4.  Let  A  be  nonempty  and  bounded  above  so  that  s  =  sup  A 
exists. 

(a)  Show  that  s  G  A. 

(b)  Can  an  open  set  contain  its  supremum? 

Exercise  3.2.5.  Prove  Theorem  3.2.8. 

Exercise  3.2.6.  Decide  whether  the  following  statements  are  true  or  false. 
Provide  counterexamples  for  those  that  are  false,  and  supply  proofs  for  those 
that  are  true. 

(a)  An  open  set  that  contains  every  rational  number  must  necessarily  be  all 
of  R. 

(b)  The  Nested  Interval  Property  remains  true  if  the  term  “closed  interval”  is 
replaced  by  “closed  set.” 

(c)  Every  nonempty  open  set  contains  a  rational  number. 

(d)  Every  bounded  infinite  closed  set  contains  a  rational  number. 

(e)  The  Cantor  set  is  closed. 

Exercise  3.2.7.  Given  4CR,  let  L  be  the  set  of  all  limit  points  of  A. 

(a)  Show  that  the  set  L  is  closed. 

(b)  Argue  that  if  x  is  a  limit  point  of  iUL,  then  x  is  a  limit  point  of  A.  Use 
this  observation  to  furnish  a  proof  for  Theorem  3.2.12. 

Exercise  3.2.8.  Assume  A  is  an  open  set  and  B  is  a  closed  set.  Determine  if 
the  following  sets  are  definitely  open,  definitely  closed,  both,  or  neither. 

(a)  AuB 

(b)  A\B  =  {x  G  A  :  x  ^  B} 

(c)  (ACUB)C 

(d)  (AnB)u(AcnB) 

(e)  TnT 

Exercise  3.2.9  (De  Morgan’s  Laws).  A  proof  for  De  Morgan’s  Laws  in  the 
case  of  two  sets  is  outlined  in  Exercise  1.2.5.  The  general  argument  is  similar. 
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(a)  Given  a  collection  of  sets  {E\  :  A  G  A},  show  that 


(b)  Now,  provide  the  details  for  the  proof  of  Theorem  3.2.14. 

Exercise  3.2.10.  Only  one  of  the  following  three  descriptions  can  be  realized. 
Provide  an  example  that  illustrates  the  viable  description,  and  explain  why  the 
other  two  cannot  exist. 

(i)  A  countable  set  contained  in  [0, 1]  with  no  limit  points. 

(ii)  A  countable  set  contained  in  [0, 1]  with  no  isolated  points. 

(iii)  A  set  with  an  uncountable  number  of  isolated  points. 

Exercise  3.2.11.  (a)  Prove  that  A  U  B  =  A  U  B. 

(b)  Does  this  result  about  closures  extend  to  infinite  unions  of  sets? 

Exercise  3.2.12.  Let  A  be  an  uncountable  set  and  let  B  be  the  set  of  real 
numbers  that  divides  A  into  two  uncountable  sets;  that  is,  s  G  B  if  both  {x  : 
x  G  A  and  x  <  s}  and  {  x  •  oc  G  A  and  x  >  s}  are  uncountable.  Show  B  is 
nonempty  and  open. 

Exercise  3.2.13.  Prove  that  the  only  sets  that  are  both  open  and  closed  are 
R  and  the  empty  set  0. 

Exercise  3.2.14.  A  dual  notion  to  the  closure  of  a  set  is  the  interior  of  a  set. 
The  interior  of  E  is  denoted  E°  and  is  defined  as 


E°  =  {x  G  E  :  there  exists  Ve(x)  C  E}. 


Results  about  closures  and  interiors  possess  a  useful  symmetry. 

(a)  Show  that  E  is  closed  if  and  only  if  E  =  E.  Show  that  E  is  open  if  and 
only  if  E°  =  E. 

(b)  Show  that  E°  =  (Ec)°,  and  similarly  that  ( E°)c  =  Ec. 

Exercise  3.2.15.  A  set  A  is  called  an  Fa  set  if  it  can  be  written  as  the  countable 
union  of  closed  sets.  A  set  B  is  called  a  Gs  set  if  it  can  be  written  as  the 
countable  intersection  of  open  sets. 


(a)  Show  that  a  closed  interval  [a,  b }  is  a  Gs  set. 

(b)  Show  that  the  half-open  interval  (a,  b]  is  both  a  Gs  and  an  Fa  set. 

(c)  Show  that  Q  is  an  Fa  set,  and  the  set  of  irrationals  I  forms  a  Gs  set. 
(We  will  see  in  Section  3.5  that  Q  is  not  a  Gs  set,  nor  is  I  an  Fa  set.) 
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3.3  Compact  Sets 


The  central  challenge  in  analysis  is  to  exploit  the  power  of  the  mathematical 
infinite — via  limits,  series,  derivatives,  integrals,  etc. — without  falling  victim  to 
erroneous  logic  or  faulty  intuition.  A  major  tool  for  maintaining  a  rigorous 
footing  in  this  endeavor  is  the  concept  of  compact  sets.  In  ways  that  will  be¬ 
come  clear,  especially  in  our  upcoming  study  of  continuous  functions,  employing 
compact  sets  in  a  proof  often  has  the  effect  of  bringing  a  finite  quality  to  the 
argument,  thereby  making  it  much  more  tractable. 

Definition  3.3.1  (Compactness).  A  set  if  C  R  is  compact  if  every  sequence 
in  K  has  a  subsequence  that  converges  to  a  limit  that  is  also  in  K. 


Example  3.3.2.  The  most  basic  example  of  a  compact  set  is  a  closed  interval. 
To  see  this,  notice  that  if  (an)  is  contained  in  an  interval  [c,  d],  then  the  Bolzano- 
Weierstrass  Theorem  guarantees  that  we  can  find  a  convergent  subsequence 
(anfc).  Because  a  closed  interval  is  a  closed  set  (Example  3.2.9,  (ii) ) ,  we  know 
that  the  limit  of  this  subsequence  is  also  in  [c,  d]. 


What  are  the  properties  of  closed  intervals  that  we  used  in  the  preceding 
argument?  The  Bolzano- Weierstrass  Theorem  requires  boundedness,  and  we 
used  the  fact  that  closed  sets  contain  their  limit  points.  As  we  are  about  to 
see,  these  two  properties  completely  characterize  compact  sets  in  R.  The  term 
“bounded”  has  thus  far  only  been  used  to  describe  sequences  (Definition  2.3.1), 
but  an  analogous  statement  can  easily  be  made  about  sets. 


Definition  3.3.3.  A  set  A  C  R  is  bounded  if  there  exists  M  >  0  such  that 


a 


<  M  for  all  a  £  A. 


Theorem  3.3.4  (Characterization  of  Compactness  in  R).  A  set  K  C  R 

is  compact  if  and  only  if  it  is  closed  and  bounded. 


Proof.  Let  K  be  compact.  We  will  first  prove  that  K  must  be  bounded,  so 
assume,  for  contradiction,  that  K  is  not  a  bounded  set.  The  idea  is  to  produce 
a  sequence  in  K  that  marches  off  to  infinity  in  such  a  way  that  it  cannot  have  a 
convergent  subsequence  as  the  definition  of  compact  requires.  To  do  this,  notice 
that  because  K  is  not  bounded  there  must  exist  an  element  x\  £  K  satisfying 
xi\  >  1.  Likewise,  there  must  exist  £  K  with  \x2\  >  2,  and  in  general,  given 
any  n  £  N,  we  can  produce  xn  £  K  such  that  \xn\  >  n. 

Now,  because  K  is  assumed  to  be  compact,  (xn)  should  have  a  convergent 
subsequence  (xnk).  But  the  elements  of  the  subsequence  must  satisfy  \xnk  \  > 
rife,  and  consequently  (xnk)  is  unbounded.  Because  convergent  sequences  are 
bounded  (Theorem  2.3.2),  we  have  a  contradiction.  Thus,  K  must  at  least  be  a 
bounded  set. 

Next,  we  will  show  that  K  is  also  closed.  To  see  that  K  contains  its  limit 
points,  we  let  x  =  limxn,  where  (xn)  is  contained  in  K  and  argue  that  x 
must  be  in  K  as  well.  By  Definition  3.3.1,  the  sequence  (xn)  has  a  convergent 
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subsequence  (xnfc),  and  by  Theorem  2.5.2,  we  know  (xnk)  converges  to  the  same 
limit  x.  Finally,  Definition  3.3.1  requires  that  x  E  K .  This  proves  that  K  is 
closed. 

The  proof  of  the  converse  statement  is  requested  in  Exercise  3.3.3.  □ 

There  may  be  a  temptation  to  consider  closed  intervals  as  being  a  kind  of 
standard  archetype  for  compact  sets,  but  this  is  misleading.  The  structure  of 
compact  sets  can  be  much  more  intricate  and  interesting.  For  instance,  one 
implication  of  Theorem  3.3.4  is  that  the  Cantor  set  is  compact.  It  is  more 
useful  to  think  of  compact  sets  as  generalizations  of  closed  intervals.  Whenever 
a  fact  involving  closed  intervals  is  true,  it  is  often  the  case  that  the  same  result 
holds  when  we  replace  “closed  interval”  with  “compact  set.”  As  an  example, 
let’s  experiment  with  the  Nested  Interval  Property  proved  in  the  first  chapter. 

Theorem  3.3.5  (Nested  Compact  Set  Property).  If 


Ki  D  K2  T  K3  D  Ka  D  •  •  • 


is  a  nested  sequence  of  nonempty  compact  sets,  then  the  intersection  it=  ;  1  Kit 
is  not  empty. 

Proof.  In  order  to  take  advantage  of  the  compactness  of  each  Kn ,  we  are  going 
to  produce  a  sequence  that  is  eventually  in  each  of  these  sets.  Thus,  for  each 
n  E  N,  pick  a  point  xn  E  Kn.  Because  the  compact  sets  are  nested,  it  follows 
that  the  sequence  (xn)  is  contained  in  K\.  By  Definition  3.3.1,  (xn)  has  a 
convergent  subsequence  (xnk)  whose  limit  x  =  limxnk  is  an  element  of  K\. 

In  fact,  x  is  an  element  of  every  Kn  for  essentially  the  same  reason.  Given 
a  particular  no  G  N,  the  terms  in  the  sequence  (xn)  are  contained  in  Kno  as 
long  as  n  >  no-  Ignoring  the  finite  number  of  terms  for  which  <  no,  the 
same  subsequence  (xnk)  is  then  also  contained  in  Kno.  The  conclusion  is  that 
x  =  limxnk  is  an  element  of  Kno.  Because  no  was  arbitrary,  x  E  H^Li  and 
the  proof  is  complete.  □ 

Open  Covers 

Defining  compactness  for  sets  in  R  is  reminiscent  of  the  situation  we  encountered 
with  completeness  in  that  there  are  a  number  of  equivalent  ways  to  describe  this 
phenomenon.  We  demonstrated  the  equivalence  of  two  such  characterizations 
in  Theorem  3.3.4.  What  this  theorem  implies  is  that  we  could  have  decided  to 
define  compact  sets  to  be  sets  that  are  closed  and  bounded,  and  then  proved  that 
sequences  contained  in  compact  sets  have  convergent  subsequences  with  limits 
in  the  set.  There  are  some  larger  issues  involved  in  deciding  what  the  definition 
should  be,  but  what  is  important  at  this  moment  is  that  we  be  versatile  enough 
to  use  whatever  description  of  compactness  is  most  appropriate  for  a  given 
situation. 

Although  Theorem  3.3.4  is  sufficient  for  most  of  our  purposes,  there  is  a 
third  important  characterization  of  compactness,  equivalent  to  the  two  others, 
which  is  described  in  terms  of  open  covers  and  finite  subcovers. 
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Definition  3.3.6.  Let  A  C  R.  An  open  cover  for  A  is  a  (possibly  infinite) 
collection  of  open  sets  {0\  :  A  E  A}  whose  union  contains  the  set  A;  that  is, 
A  C  UAGA0A.  Given  an  open  cover  for  A,  a  finite  subcover  is  a  finite  sub¬ 
collection  of  open  sets  from  the  original  open  cover  whose  union  still  manages 
to  completely  contain  A. 

Example  3.3.7.  Consider  the  open  interval  (0, 1).  For  each  point  x  E  (0,1), 
let  Ox  be  the  open  interval  (x/2, 1).  Taken  together,  the  infinite  collection 
{Ox  :  x  E  (0, 1)}  forms  an  open  cover  for  the  open  interval  (0, 1).  Notice, 
however,  that  it  is  impossible  to  find  a  finite  subcover.  Given  any  proposed 
finite  subcollection 

{0Xlt  Ox 2 , . . . ,  0Xn  } , 

set  x'  =  minjxi,  •  •  • ,  xn}  and  observe  that  any  real  number  y  satisfying 
0  <  y  <  x' / 2  is  not  contained  in  the  union  \Jg=1  0Xi. 


•s 


( 


) 


0 


x2 

2 


X2 


X1 

2 


Xi  1 


's. 


Now,  consider  a  similar  cover  for  the  closed  interval  [0, 1].  For  x  E  (0,1), 
the  sets  Ox  =  (x/ 2, 1)  do  a  fine  job  covering  (0, 1),  but  in  order  to  have  an  open 
cover  of  the  closed  interval  [0, 1],  we  must  also  cover  the  endpoints.  To  remedy 
this,  we  could  fix  e  >  o,  and  let  Oo  —  ( — e,  e)  and  0\  —  (1  —  e,  1  e).  Then,  the 

collection 

{O0,  Oi,  Ox  :  x  E  (0, 1)} 

is  an  open  cover  for  [0,1].  But  this  time,  notice  there  is  a  finite  subcover. 
Because  of  the  addition  of  the  set  Oo,  we  can  choose  x'  so  that  x' / 2  <  e.  It 
follows  that  {Oo,  Ox' ,  Oi}  is  a  finite  subcover  for  the  closed  interval  [0, 1]. 

Theorem  3.3.8  (Heine— Borel  Theorem).  Let  K  be  a  subset  of  R.  All  of 

the  following  statements  are  equivalent  in  the  sense  that  any  one  of  them  implies 
the  two  others: 

(i)  K  is  compact. 

(ii)  K  is  closed  and  bounded. 

(iii)  Every  open  cover  for  K  has  a  finite  subcover. 

Proof.  The  equivalence  of  (i)  and  (ii)  is  the  content  of  Theorem  3.3.4.  What 
remains  is  to  show  that  (iii)  is  equivalent  to  (i)  and  (ii).  Let’s  first  assume  (iii), 
and  prove  that  it  implies  (ii)  (and  thus  (i)  as  well). 
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To  show  that  K  is  bounded,  we  construct  an  open  cover  for  K  by  defining 
Ox  to  be  an  open  interval  of  radius  1  around  each  point  x  E  K.  In  the  language 
of  neighborhoods,  Ox  =  V\(x).  The  open  cover  {Ox  :  x  E  K}  then  must  have 
a  finite  subcover  {0Xl,  0X2 , . . . ,  0Xn}.  Because  K  is  contained  in  a  finite  union 
of  bounded  sets,  K  must  itself  be  bounded. 

The  proof  that  K  is  closed  is  more  delicate,  and  we  argue  it  by  contradiction. 
Let  (yn)  be  a  Cauchy  sequence  contained  in  K  with  lim  yn  =  y.  To  show  that 
K  is  closed,  we  must  demonstrate  that  y  E  K,  so  assume  for  contradiction  that 
this  is  not  the  case.  If  y  ^  K,  then  every  x  E  K  is  some  positive  distance  away 
from  y.  We  now  construct  an  open  cover  by  taking  Ox  to  be  an  interval  of  radius 
x  —  y  |/2  around  each  point  x  in  K.  Because  we  are  assuming  (iii),  the  resulting 
open  cover  {Ox  :  x  E  K}  must  have  a  finite  subcover  {0Xl,  0X21 . . . ,  0Xn}.  The 
contradiction  arises  when  we  realize  that,  in  the  spirit  of  Example  3.3.7,  this 
finite  subcover  cannot  contain  all  of  the  elements  of  the  sequence  (yn).  To  make 
this  explicit,  set 


e  o  =  min 


1  <  i  <  n 


Because  (yn)  y,  we  can  certainly  find  a  term  y^  satisfying  \y^  —  y 
such  a  yjsr  must  necessarily  be  excluded  from  each  Ox.,  meaning  that 


<  cq.  But 


n 

VN  i  U  ' 

i— 1 


Thus  our  supposed  subcover  does  not  actually  cover  all  of  K.  This  contradiction 
implies  that  y  E  K,  and  hence  K  is  closed  and  bounded. 

The  proof  that  (ii)  implies  (iii)  is  outlined  in  Exercise  3.3.9.  To  be  historically 
accurate,  it  is  this  particular  implication  that  is  most  appropriately  referred  to 
as  the  Heine-Borel  Theorem.  □ 


Exercises 

Exercise  3.3.1.  Show  that  if  K  is  compact  and  nonempty,  then  supiL  and 
inf  K  both  exist  and  are  elements  of  K. 

Exercise  3.3.2.  Decide  which  of  the  following  sets  are  compact.  For  those  that 
are  not  compact,  show  how  Definition  3.3.1  breaks  down.  In  other  words,  give 
an  example  of  a  sequence  contained  in  the  given  set  that  does  not  possess  a 
subsequence  converging  to  a  limit  in  the  set. 


(a) 

N. 

(b) 

Qn  [o,i]. 

(c) 

The  Cantor 

set. 

(d) 

{1  +  1/22  + 

1/32  H - 

+  1/n2  :  n  <G  N}. 

(e) 

{M/2,  2/3, 

3/4, 4/5, . 

100 


Chapter  3.  Basic  Topology  of  R 


Exercise  3.3.3.  Prove  the  converse  of  Theorem  3.3.4  by  showing  that  if  a  set 
if  C  R  is  closed  and  bounded,  then  it  is  compact. 

Exercise  3.3.4.  Assume  K  is  compact  and  F  is  closed.  Decide  if  the  following 
sets  are  definitely  compact,  definitely  closed,  both,  or  neither. 

(a)  K  H  F 

(b)  Fc  U  Kc 

(c)  K\F  =  {xeK  :x(£F} 

(d)  KHFC 

Exercise  3.3.5.  Decide  whether  the  following  propositions  are  true  or  false. 
If  the  claim  is  valid,  supply  a  short  proof,  and  if  the  claim  is  false,  provide  a 
counterexample . 

(a)  The  arbitrary  intersection  of  compact  sets  is  compact. 

(b)  The  arbitrary  union  of  compact  sets  is  compact. 

(c)  Let  A  be  arbitrary,  and  let  K  be  compact.  Then,  the  intersection  An  K 
is  compact. 

(d)  If  Fi  D  F2  D  F3  D  F4  D  •  •  •  is  a  nested  sequence  of  nonempty  closed  sets, 
then  the  intersection  p|^i  7^  0- 

Exercise  3.3.6.  This  exercise  is  meant  to  illustrate  the  point  made  in  the 
opening  paragraph  to  Section  3.3.  Verify  that  the  following  three  statements 
are  true  if  every  blank  is  filled  in  with  the  word  “finite.”  Which  are  true  if  every 
blank  is  filled  in  with  the  word  “compact”?  Which  are  true  if  every  blank  is 
filled  in  with  the  word  “closed”? 

(a)  Every _ set  has  a  maximum. 

(b)  If  A  and  B  are _ ,  then  A  +  B  =  {a  +  b  :  a  £  A,  b  £  B}  is  also _ . 

(c)  If  {An  :  n  E  N}  is  a  collection  of _  sets  with  the  property  that 

every  finite  subcollection  has  a  nonempty  intersection,  then  An  is 

nonempty  as  well. 

Exercise  3.3.7.  As  some  more  evidence  of  the  surprising  nature  of  the  Cantor 
set,  follow  these  steps  to  show  that  the  sum  C  +  C  =  {x  +  y  :  x,y  <E  C}  is  equal 
to  the  closed  interval  [0,  2].  (Keep  in  mind  that  C  has  zero  length  and  contains 
no  intervals.) 

Because  C  C  [0,1],  C  +  C  C  [0,2],  so  we  only  need  to  prove  the  reverse 
inclusion  [0,2]  C  {x  +  y  :  x,  y  E  C}.  Thus,  given  s  E  [0,2],  we  must  find  two 
elements  x,y  E  C  satisfying  x  +  y  =  s. 

(a)  Show  that  there  exist  x\,yi  E  C\  for  which  x\  +  y\  =  s.  Show  in  general 
that,  for  an  arbitrary  n  E  N,  we  can  always  find  xn,yn  E  Cn  for  which 
%n  T  yn  — 
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(b)  Keeping  in  mind  that  the  sequences  (xn)  and  (yn)  do  not  necessarily 
converge,  show  how  they  can  nevertheless  be  used  to  produce  the  desired 
x  and  y  in  C  satisfying  x  +  y  =  s. 

Exercise  3.3.8.  Let  K  and  L  be  nonempty  compact  sets,  and  define 


d  =  inf {|:r  —  y\  :  x  E  K  and  y  E  L}. 

This  turns  out  to  be  a  reasonable  definition  for  the  distance  between  K  and  L. 


(a)  If  K  and  L  are  disjoint,  show  d  >  0  and  that  d  =  \xo~yo  \  for  some  xo  E  K 
and  yo  E  L. 


(b)  Show  that  it’s  possible  to  have  d 
sets  K  and  L  are  closed. 


0  if  we  assume  only  that  the  disjoint 


Exercise  3.3.9.  Follow  these  steps  to  prove  the  final  implication  in  Theo¬ 
rem  3.3.8. 

Assume  K  satisfies  (i)  and  (ii),  and  let  {Oa  :  A  E  A}  be  an  open  cover  for 
K.  For  contradiction,  let’s  assume  that  no  finite  subcover  exists.  Let  To  be  a 
closed  interval  containing  K. 


Show  that  there  exists  a  nested  sequence  of  closed  intervals  /0  T  A  D  I2  D 
•  •  •  with  the  property  that,  for  each  n,  In  D  K  cannot  be  finitely  covered 
and  lim  In  =0. 


(b)  Argue  that  there  exists  an  x  E  K  such  that  x  E  In  for  all  n. 

(c)  Because  x  E  Ff ,  there  must  exist  an  open  set  O\0  from  the  original  collec¬ 
tion  that  contains  x  as  an  element.  Explain  how  this  leads  to  the  desired 
contradiction. 


Exercise  3.3.10.  Here  is  an  alternate  proof  to  the  one  given  in  Exercise  3.3.9 
for  the  final  implication  in  the  Heine-Borel  Theorem. 

Consider  the  special  case  where  K  is  a  closed  interval.  Let  {Oa  :  A  E  A}  be 
an  open  cover  for  [a,  b }  and  define  S  to  be  the  set  of  all  x  E  [a,  b]  such  that  [a,  x 
has  a  finite  subcover  from  {Oa  •  A  E  A}. 


(a)  Argue  that  S  is  nonempty  and  bounded,  and  thus  s  =  sup  S  exists. 

(b)  Now  show  8  =  6,  which  implies  [a,  b]  has  a  finite  subcover. 


(c)  Finally,  prove  the  theorem  for  an  arbitrary  closed  and  bounded  set  K. 

Exercise  3.3.11.  Consider  each  of  the  sets  listed  in  Exercise  3.3.2.  For  each 
one  that  is  not  compact,  find  an  open  cover  for  which  there  is  no  finite  subcover. 

Exercise  3.3.12.  Using  the  concept  of  open  covers  (and  explicitly  avoiding 
the  Bolzano- Weierstrass  Theorem) ,  prove  that  every  bounded  infinite  set  has  a 
limit  point. 

Exercise  3.3.13.  Let’s  call  a  set  clompact  if  it  has  the  property  that  every 
closed  cover  (i.e.,  a  cover  consisting  of  closed  sets)  admits  a  finite  subcover. 
Describe  all  of  the  clompact  subsets  of  R. 


102 


Chapter  3.  Basic  Topology  of  R 


3.4  Perfect  Sets  and  Connected  Sets 

One  of  the  underlying  goals  of  topology  is  to  strip  away  all  of  the  extraneous 
information  that  comes  with  our  intuitive  picture  of  the  real  numbers  and  isolate 
just  those  properties  that  are  responsible  for  the  phenomenon  we  are  studying. 
For  example,  we  were  quick  to  observe  that  any  closed  interval  is  a  compact 
set.  The  content  of  Theorem  3.3.4,  however,  is  that  the  compactness  of  a  closed 
interval  has  nothing  to  do  with  the  fact  that  the  set  is  an  interval  but  is  a 
consequence  of  the  set  being  bounded  and  closed.  In  Chapter  1,  we  argued  that 
the  set  of  real  numbers  between  0  and  1  is  an  uncountable  set.  This  turns  out  to 
be  the  case  for  any  nonempty  closed  set  that  does  not  contain  isolated  points. 

Perfect  Sets 

Definition  3.4.1.  A  set  P  C  R  is  perfect  if  it  is  closed  and  contains  no  isolated 
points. 

Closed  intervals  (other  than  the  singleton  sets  [a,  a])  serve  as  the  most 
obvious  class  of  perfect  sets,  but  there  are  more  interesting  examples. 

Example  3.4.2  (Cantor  Set).  It  is  not  too  hard  to  see  that  the  Cantor  set  is 
perfect.  In  Section  3.1,  we  defined  the  Cantor  set  as  the  intersection 

oo 

c=  f]cn, 

n— 0 

where  each  Cn  is  a  finite  union  of  closed  intervals.  By  Theorem  3.2.14,  each  Cn 
is  closed,  and  by  the  same  theorem,  C  is  closed  as  well.  It  remains  to  show  that 
no  point  in  C  is  isolated. 

Let  x  G  C  be  arbitrary.  To  convince  ourselves  that  x  is  not  isolated,  we  must 
construct  a  sequence  (xn)  of  points  in  C,  different  from  x,  that  converges  to  x. 
From  our  earlier  discussion,  we  know  that  C  at  least  contains  the  endpoints  of 
the  intervals  that  make  up  each  Cn.  In  Exercise  3.4.3,  we  sketch  the  argument 
that  these  are  all  that  is  needed  to  construct  (xn). 

One  argument  for  the  uncountability  of  the  Cantor  set  was  presented  in 
Section  3.1.  Another,  perhaps  more  satisfying,  argument  for  the  same  conclusion 
can  be  obtained  from  the  next  theorem. 

Theorem  3.4.3.  A  nonempty  perfect  set  is  uncountable. 

Proof.  If  P  is  perfect  and  nonempty,  then  it  must  be  infinite  because  otherwise 
it  would  consist  only  of  isolated  points.  Let’s  assume,  for  contradiction,  that  P 
is  countable.  Thus,  we  can  write 

P  i ,  X2  -)  x$ , . . . } , 

where  every  element  of  P  appears  on  this  list.  The  idea  is  to  construct  a 
sequence  of  nested  compact  sets  iLn,  all  contained  in  P,  with  the  property  that 
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X\  K2,  x2  i  K3,  xs  ^  i^4,  ....  Some  care  must  be  taken  to  ensure  that  each 
Kn  is  nonempty,  for  then  we  can  use  Theorem  3.3.5  to  produce  an 

oo 

x  e  pi  k n  c  p 

n= 1 


that  cannot  be  on  the  list  {oq,£2,^3, . . .}. 

Let  Ii  be  a  closed  interval  that  contains  x\  in  its  interior  (i.e.,  x\  is  not  an 
endpoint  of  Ii).  Now,  x\  is  not  isolated,  so  there  exists  some  other  point  2/2  G  P 
that  is  also  in  the  interior  of  I\.  Construct  a  closed  interval  /2,  centered  on  2/2 ? 
so  that  I2  C  Ii  but  x\  ^  I2.  More  explicitly,  if  I\  =  [a,  5],  let 


e 


min  {2/2 


2/2,  X! 


Then,  the  interval  I2  =  [2/2  —  e/2, 7/2  +  e/2]  has  the  desired  properties. 


h 


X\ 


V2 


12 


This  process  can  be  continued.  Because  2/2  £  P  is  not  isolated,  there  must  exist 
another  point  2/3  £  P  in  the  interior  of  /2,  and  we  may  insist  that  2/3  7^  £2* 
Now,  construct  ^3  centered  on  2/3  and  small  enough  so  that  X2  I3  and  Is  C  h- 
Observe  that  Is  H  P  7^  0  because  this  intersection  contains  at  least  2/3. 

If  we  carry  out  this  construction  inductively,  the  result  is  a  sequence  of  closed 
intervals  In  satisfying 

(1)  In-\- 1  —  In 1 

(ii)  xn  0  7n+i,  and 

(hi)  In  n  P  +  0. 

To  finish  the  proof,  we  let  iLn  =  In  D  P.  For  each  n  G  N,  we  have  that  iLn  is 
closed  because  it  is  the  intersection  of  closed  sets,  and  bounded  because  it  is 
contained  in  the  bounded  set  In.  Hence,  Kn  is  compact.  By  construction,  Kn 
is  not  empty  and  iLn+i  T  Kn.  Thus,  we  can  employ  the  Nested  Compact  Set 
Property  (Theorem  3.3.5)  to  conclude  that  the  intersection 

00 

Pj  Kn  (/}. 

n= 1 

But  each  Kn  is  a  subset  of  P,  and  the  fact  that  xn  0  In+ 1  leads  to  the  conclusion 
that  Kn  =  0,  which  is  the  sought-after  contradiction.  □ 
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Connected  Sets 

Although  the  two  open  intervals  (1,2)  and  (2,5)  have  the  limit  point  x  =  2  in 
common,  there  is  still  some  space  between  them  in  the  sense  that  no  limit  point 
of  one  of  these  intervals  is  actually  contained  in  the  other.  Said  another  way, 
the  closure  of  (1,  2)  (see  Definition  3.2.11)  is  disjoint  from  (2,  5),  and  the  closure 
of  (2,5)  does  not  intersect  (1,2).  Notice  that  this  same  observation  cannot  be 
made  about  (1,2]  and  (2,5),  even  though  these  latter  sets  are  disjoint. 

Definition  3.4.4.  Two  nonempty  sets  A,  B  C  R  are  separated  if  A  D  B  and 
AC  B  are  both  empty.  A  set  E  C  R  is  disconnected  if  it  can  be  written  as 
E  =  AU  B,  where  A  and  B  are  nonempty  separated  sets. 

A  set  that  is  not  disconnected  is  called  a  connected  set. 

Example  3.4.5.  (i)  If  we  let  A  =  (1,  2)  and  B  =  (2,  5),  then  it  is  not  difficult 

to  verify  that  E  =  (1,2)  U  (2,5)  is  disconnected.  Notice  that  the  sets 
C  =  (1,2]  and  D  =  (2,5)  are  not  separated  because  C  D  D  =  {2}  is 
not  empty.  This  should  be  comforting.  The  union  C  U  D  is  equal  to  the 
interval  (1,5),  which  better  not  qualify  as  a  disconnected  set.  We  will 
prove  in  a  moment  that  every  interval  is  a  connected  subset  of  R  and  vice 
versa. 

(ii)  Let’s  show  that  the  set  of  rational  numbers  is  disconnected.  If  we  let 

A  =  Q  n  (—oo,  V2)  and  B  =  Q  D  (V%  00), 

then  we  certainly  have  Q  =  A  U  B.  The  fact  that  A  C  (—00,  y/2)  implies 
(by  the  Order  Limit  Theorem)  that  any  limit  point  of  A  will  necessarily 
fall  in  (  — oc,\/2]*  Because  this  is  disjoint  from  B,  we  get  An  B  =  0. 
We  can  similarly  show  that  A  C  B  =  0,  which  implies  that  A  and  B  are 
separated. 

The  definition  of  connected  is  stated  as  the  negation  of  disconnected,  but  a 
little  care  with  the  logical  negation  of  the  quantifiers  in  Definition  3.4.4  results 
in  a  positive  characterization  of  connectedness.  Essentially,  a  set  E  is  connected 
if,  no  matter  how  it  is  partitioned  into  two  nonempty  disjoint  sets,  it  is  always 
possible  to  show  that  at  least  one  of  the  sets  contains  a  limit  point  of  the  other. 

Theorem  3.4.6.  A  set  E  C  R  is  connected  if  and  only  if,  for  all  nonempty 
disjoint  sets  A  and  B  satisfying  E  =  A  U  B,  there  always  exists  a  convergent 
sequence  (xn)  —>  x  with  (xn)  contained  in  one  of  A  or  B,  and  x  an  element  of 
the  other. 

Proof.  Exercise  3.4.6.  □ 

The  concept  of  connectedness  is  more  relevant  when  working  with  subsets 
of  the  plane  and  other  higher-dimensional  spaces.  This  is  because,  in  R,  the 
connected  sets  coincide  precisely  with  the  collection  of  intervals  (with  the  un¬ 
derstanding  that  unbounded  intervals  such  as  (—00,  3)  and  [0,  00)  are  included). 
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Theorem  3.4.7.  A  set  E  C  R  is  connected  if  and  only  if  whenever  a  <  c  <  b 
with  a,b  E  E,  it  follows  that  c  <E  E  as  well. 

Proof.  Assume  E  is  connected,  and  let  a,  b  E  E  and  a  <  c  <  b.  Set 

A  =  (—oo,  c)  n  E  and  B  =  (c,  oo)  n  E. 


Because  a  A  and  &  G  5,  neither  set  is  empty  and,  just  as  in  Example  3.4.5 
(ii),  neither  set  contains  a  limit  point  of  the  other.  If  E  =  iUB,  then  we  would 
have  that  E  is  disconnected,  which  it  is  not.  It  must  then  be  that  A  U  B  is 
missing  some  element  of  E,  and  c  is  the  only  possibility.  Thus,  c  E  E. 

Conversely,  assume  that  E  is  an  interval  in  the  sense  that  whenever  a,  b  E  E 
satisfy  a  <  c  <  b  for  some  c,  then  cGE  Our  intent  is  to  use  the  characterization 
of  connected  sets  in  Theorem  3.4.6,  so  let  E  =  A  U  7>,  where  A  and  B  are 
nonempty  and  disjoint.  We  need  to  show  that  one  of  these  sets  contains  a  limit 
point  of  the  other.  Pick  ao  G  A  and  bo  E  7>,  and,  for  the  sake  of  the  argument, 
assume  ao  <  bo-  Because  E  is  itself  an  interval,  the  interval  /q  =  [ao,bo]  is 
contained  in  E.  Now,  bisect  To  into  two  equal  halves.  The  midpoint  of  Iq  must 
either  be  in  A  or  7>,  and  so  choose  I\  —  [ai,  bi\  to  be  the  half  that  allows  us  to 
have  a\  E  A  and  b\  E  B.  Continuing  this  process  yields  a  sequence  of  nested 


intervals  In  =  [an,bn],  where  an  E  A,  bn  E  B ,  and  the  length  (bn  —  an)  0. 
The  remainder  of  this  argument  should  feel  familiar.  By  the  Nested  Interval 
Property,  there  exists  an 


oo 


x  E 


n 


n= 0 


and  it  is  straightforward  to  show  that  the  sequences  of  endpoints  each  satisfy 
liman  =  x  and  limbn  =  x.  But  now  x  E  E  must  belong  to  either  A  or  B ,  thus 
making  it  a  limit  point  of  the  other.  This  completes  the  argument.  □ 


Exercises 


Exercise  3.4.1.  If  P  is  a  perfect  set  and  K  is  compact,  is  the  intersection  PnK 
always  compact?  Always  perfect? 

Exercise  3.4.2.  Does  there  exist  a  perfect  set  consisting  of  only  rational  num¬ 
bers? 


Exercise  3.4.3.  Review  the  portion  of  the  proof  given  in  Example  3.4.2  and 
follow  these  steps  to  complete  the  argument. 


Because  x  E  C, 
satisfying  \x  —  x\ 


argue  that  there  exists  an  x\  E  C  D  C\  with  x\  ^  x 
<  1/3. 


Finish  the  proof  by  showing  that  for  each  n  E  N,  there  exists  xn  E  CnCni 
different  from  x,  satisfying  \x  —  xn\  <  l/3n. 


Exercise  3.4.4.  Repeat  the  Cantor  construction  from  Section  3.1  starting  with 
the  interval  [0, 1].  This  time,  however,  remove  the  open  middle  fourth  from  each 
component. 
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(a)  Is  the  resulting  set  compact?  Perfect? 

(b)  Using  the  algorithms  from  Section  3.1,  compute  the  length  and  dimension 
of  this  Cantor-like  set. 

Exercise  3.4.5.  Let  A  and  B  be  nonempty  subsets  of  R.  Show  that  if  there 
exist  disjoint  open  sets  U  and  V  with  A  C  U  and  B  C  V,  then  A  and  B  are 
separated. 

Exercise  3.4.6.  Prove  Theorem  3.4.6. 

Exercise  3.4.7.  A  set  E  is  totally  disconnected  if,  given  any  two  distinct  points 
x,  y  £  E,  there  exist  separated  sets  A  and  B  with  x  E  A,  y  E  R,  and  E  =  AuB. 

(a)  Show  that  Q  is  totally  disconnected. 

(b)  Is  the  set  of  irrational  numbers  totally  disconnected? 

Exercise  3.4.8.  Follow  these  steps  to  show  that  the  Cantor  set  is  totally  dis¬ 
connected  in  the  sense  described  in  Exercise  3.4.7. 

Let  C  =  fT=o  as  defined  in  Section  3.1. 

(a)  Given  x,  y  E  C,  with  x  <  y,  set  e  =  y  —  x.  For  each  n  =  0, 1,  2, . . .,  the 
set  Cn  consists  of  a  finite  number  of  closed  intervals.  Explain  why  there 
must  exist  an  N  large  enough  so  that  it  is  impossible  for  x  and  y  both  to 
belong  to  the  same  closed  interval  of  Cjv- 

(b)  Show  that  C  is  totally  disconnected. 

Exercise  3.4.9.  Let  {ri,  ^2,^3,...}  be  an  enumeration  of  the  rational  numbers, 
and  for  each  n  E  N  set  en  =  l/2n.  Define  O  =  U^i  U£n(rn),  and  let  F  =  Oc . 

(a)  Argue  that  F  is  a  closed,  nonempty  set  consisting  only  of  irrational 
numbers. 

(b)  Does  F  contain  any  nonempty  open  intervals?  Is  F  totally  disconnected? 
(See  Exercise  3.4.7  for  the  definition.) 

(c)  Is  it  possible  to  know  whether  F  is  perfect?  If  not,  can  we  modify  this 
construction  to  produce  a  nonempty  perfect  set  of  irrational  numbers? 

3.5  Baire’s  Theorem 

The  nature  of  the  real  line  can  be  deceptively  elusive.  The  closer  we  look,  the 
more  intricate  and  enigmatic  R  becomes,  and  the  more  we  are  reminded  to  pro¬ 
ceed  carefully  (i.e.,  axiomatically)  with  all  of  our  conclusions  about  properties 
of  subsets  of  R.  The  structure  of  open  sets  is  fairly  straightforward.  Every  open 
set  is  either  a  finite  or  countable  union  of  open  intervals.  Standing  in  opposition 
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to  this  tidy  description  of  all  open  sets  is  the  Cantor  set.  The  Cantor  set  is  a 
closed,  uncountable  set  that  contains  no  intervals  of  any  kind.  Thus,  no  such 
characterization  of  closed  sets  should  be  anticipated. 

Recall  that  the  arbitrary  union  of  open  sets  is  always  an  open  set.  Likewise, 
the  arbitrary  intersection  of  closed  sets  is  closed.  By  taking  unions  of  closed  sets 
or  intersections  of  open  sets,  however,  it  is  possible  to  obtain  a  new  selection  of 
subsets  of  R. 

Definition  3.5.1.  A  set  A  C  R  is  called  an  Fa  set  if  it  can  be  written  as  the 
countable  union  of  closed  sets.  A  set  B  C  R  is  called  a  Gs  set  if  it  can  be 
written  as  the  countable  intersection  of  open  sets. 

Exercise  3.5.1.  Argue  that  a  set  A  is  a  Gs  set  if  and  only  if  its  complement  is 
an  Fa  set. 


Exercise  3.5.2.  Replace  each 


with  the  word  finite  or  countable , 


depending  on  which  is  more  appropriate. 

(a)  The _ union  of  Fa  sets  is  an  Fa  set. 

(b)  The _ intersection  of  Fa  sets  is  an  Fa  set. 

(c)  The _ union  of  Gs  sets  is  a  Gs  set. 

(d)  The _ intersection  of  Gs  sets  is  a  Gs  set. 

Exercise  3.5.3.  (This  exercise  has  already  appeared  as  Exercise  3.2.15.) 

(a)  Show  that  a  closed  interval  [a,  b }  is  a  Gs  set. 


(b)  Show  that  the  half-open  interval  (a,  b]  is  both  a  Gs  and  an  Fa  set. 


(c)  Show  that  Q  is  an  Fa  set,  and  the  set  of  irrationals  I  forms  a  Gs  set. 

It  is  not  readily  obvious  that  the  class  Fa  does  not  include  every  subset  of 
R,  but  we  are  now  ready  to  argue  that  I  is  not  an  Fa  set  (and  consequently 
Q  is  not  a  Gs  set).  This  will  follow  from  a  theorem  due  to  Rene  Louis  Baire 
(1874-1932). 

Recall  that  a  set  G  C  R  is  dense  in  R  if,  given  any  two  real  numbers  a  <  5, 
it  is  possible  to  find  a  point  x  E  G  with  a  <  x  <  b. 

Theorem  3.5.2.  If  {G\  ,  G*2,  G*3,  •  •  •}  is  a  countable  collection  of  dense,  open 
sets,  then  the  intersection  H^Li  Gn  is  not  empty. 

Proof.  Before  embarking  on  the  proof,  notice  that  we  have  seen  a  conclusion 
like  this  before.  Theorem  3.3.5  asserts  that  a  nested  sequence  of  compact  sets 
has  a  nontrivial  intersection.  In  this  theorem,  we  are  dealing  with  dense,  open 
sets,  but  as  it  turns  out,  we  are  going  to  use  Theorem  3.3.5 — and  actually,  just 
the  Nested  Interval  Property — as  the  crucial  step  in  the  argument. 


Exercise  3.5.4.  Starting  with  n  =  1,  inductively  construct  a  nested  sequence 
of  closed  intervals  I\  D  J2  D  /3  D  •  •  •  satisfying  In  C  Gn.  Give  special  attention 
to  the  issue  of  the  endpoints  of  each  In.  Show  how  this  leads  to  a  proof  of  the 
theorem.  □ 
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Exercise  3.5.5.  Show  that  it  is  impossible  to  write 

oo 

R  =  U  Fn, 

n—1 

where  for  each  n  E  N,  Fn  is  a  closed  set  containing  no  nonempty  open  intervals. 

Exercise  3.5.6.  Show  how  the  previous  exercise  implies  that  the  set  I  of 
irrationals  cannot  be  an  Fa  set,  and  Q  cannot  be  a  G$  set. 

Exercise  3.5.7.  Using  Exercise  3.5.6  and  versions  of  the  statements  in 
Exercise  3.5.2,  construct  a  set  that  is  neither  in  Fa  nor  in  G$. 

Nowhere-Dense  Sets 

We  have  encountered  several  equivalent  ways  to  assert  that  a  particular  set  G 
is  dense  in  R.  In  Section  3.2,  we  observed  that  G  is  dense  in  R  if  and  only  if 
every  point  of  R  is  a  limit  point  of  G.  Because  the  closure  of  any  set  is  obtained 
by  taking  the  union  of  the  set  and  its  limit  points,  we  have  that 

G  is  dense  in  R  if  and  only  if  G  =  R. 

The  set  Q  is  dense  in  R;  the  set  Z  is  clearly  not.  In  fact,  in  the  jargon  of 
analysis,  Z  is  nowhere-dense  in  R. 

Definition  3.5.3.  A  set  E  is  nowhere-dense  if  E  contains  no  nonempty  open 
intervals. 

Exercise  3.5.8.  Show  that  a  set  E  is  nowhere-dense  in  R  if  and  only  if  the 
complement  of  E  is  dense  in  R. 

Exercise  3.5.9.  Decide  whether  the  following  sets  are  dense  in  R,  nowhere- 
dense  in  R,  or  somewhere  in  between. 

(a)  A  =  QH  [0,5]. 

(b)  B  =  {1  jn  :  n  £  N}. 

(c)  the  set  of  irrationals. 

(d)  the  Cantor  set. 

We  can  now  restate  Theorem  3.5.2  in  a  slightly  more  general  form. 

Theorem  3.5.4  (Baire’s  Theorem).  The  set  of  real  numbers  R  cannot  be 
written  as  the  countable  union  of  nowhere-dense  sets. 

Proof.  For  contradiction,  assume  that  Ex^E^^E^^ . . .  are  each  nowhere-dense 
and  satisfy  R  =  IJ^Li 


Exercise  3.5.10.  Finish  the  proof  by  finding  a  contradiction  to  the  results  in 
this  section.  □ 
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Baire’s  Theorem  is  yet  another  statement  about  the  size  of  R.  We  have 
already  encountered  several  ways  to  describe  the  sizes  of  infinite  sets.  In  terms 
of  cardinality,  countable  sets  are  relatively  small  whereas  uncountable  sets  are 
large.  We  also  briefly  discussed  the  concept  of  “length,”  or  “measure,”  in 
Section  3.1.  Baire’s  Theorem  offers  a  third  perspective.  From  this  point  of 
view,  nowhere-dense  sets  are  considered  to  be  “thin”  sets.  Any  set  that  is  the 
countable  union — i.e.,  a  not  very  large  union — of  these  small  sets  is  called  a 
“meager”  set  or  a  set  of  “first  category.”  A  set  that  is  not  of  first  category  is  of 
“second  category.”  Intuitively,  sets  of  the  second  category  are  the  “fat”  subsets. 
The  Baire  Category  Theorem,  as  it  is  often  called,  states  that  R  is  of  second 
category. 

There  is  a  significance  to  the  Baire  Category  Theorem  that  is  difficult  to 
appreciate  at  the  moment  because  we  are  only  seeing  a  special  case  of  this  result. 
The  real  numbers  are  an  example  of  a  complete  metric  space.  Metric  spaces  are 
discussed  in  some  detail  in  Section  8.2,  but  here  is  the  basic  idea.  Given  a  set 
of  mathematical  objects  such  as  real  numbers,  points  in  the  plane  or  continuous 
functions  defined  on  [0,1],  a  “metric”  is  a  rule  that  assigns  a  “distance”  between 
two  elements  in  the  set.  In  R,  we  have  been  using  \x  —  y\  as  the  distance  between 
the  real  numbers  x  and  y.  The  point  is  that  if  we  can  create  a  satisfactory  notion 
of  “distance”  on  these  other  spaces  (we  will  need  the  triangle  inequality  to  hold, 
for  instance),  then  the  concepts  of  convergence,  Cauchy  sequences,  and  open 
sets,  for  example,  can  be  naturally  transferred  over.  A  complete  metric  space  is 
any  set  with  a  suitably  defined  metric  in  which  Cauchy  sequences  have  limits. 
We  have  spent  a  good  deal  of  time  discussing  the  fact  that  R  is  a  complete 
metric  space  whereas  Q  is  not. 

The  Baire  Category  Theorem  in  its  more  general  form  states  that  any  com¬ 
plete  metric  space  must  be  too  large  to  be  the  countable  union  of  nowhere-dense 
subsets.  One  particularly  interesting  example  of  a  complete  metric  space  is  the 
set  of  continuous  functions  defined  on  the  interval  [0, 1].  (The  distance  between 
two  functions  /  and  g  in  this  space  is  defined  to  be  sup  \f(x)  —  g{x)  |,  where 
x  G  [0, 1].)  Now,  in  this  space  we  will  see  that  the  collection  of  continuous  func¬ 
tions  that  are  differentiable  at  even  one  point  can  be  written  as  the  countable 
union  of  nowhere-dense  sets.  Thus,  a  fascinating  consequence  of  Baire’s  Theo¬ 
rem  in  this  setting  is  that  most  continuous  functions  do  not  have  derivatives  at 
any  point.  Chapter  5  concludes  with  a  construction  of  one  such  function.  This 
odd  situation  mirrors  the  roles  of  Q  and  I  as  subsets  of  R.  Just  as  the  familiar 
rational  numbers  constitute  a  minute  proportion  of  the  real  line,  the  differen¬ 
tiable  functions  of  calculus  are  exceedingly  atypical  of  continuous  functions  in 
general. 


Chapter  4 


Functional  Limits 
and  Continuity 

4.1  Discussion:  Examples  of  Dirichlet 
and  Thomae 

Although  it  is  a  common  practice  in  calculus  courses  to  discuss  continuity  before 
differentiation,  historically  mathematicians’  attention  to  the  concept  of  continu¬ 
ity  came  long  after  the  derivative  was  in  wide  use.  Pierre  de  Fermat  (1601-1665) 
was  using  tangent  lines  to  solve  optimization  problems  as  early  as  1629.  On  the 
other  hand,  it  was  not  until  around  1820  that  Cauchy,  Bolzano,  Weierstrass,  and 
others  began  to  characterize  continuity  in  terms  more  rigorous  than  prevailing 
intuitive  notions  such  as  “unbroken  curves”  or  “functions  which  have  no  jumps 
or  gaps.”  The  basic  reason  for  this  two-hundred  year  waiting  period  lies  in 
the  fact  that,  for  most  of  this  time,  the  very  notion  of  function  did  not  really 
permit  discontinuities.  Functions  were  entities  such  as  polynomials,  sines,  and 
cosines,  always  smooth  and  continuous  over  their  relevant  domains.  The  gradual 
liberation  of  the  term  function  to  its  modern  understanding — a  rule  associat¬ 
ing  a  unique  output  with  a  given  input — was  simultaneous  with  19th  century 
investigations  into  the  behavior  of  infinite  series.  Extensions  of  the  power  of 
calculus  were  intimately  tied  to  the  ability  to  represent  a  function  f(x)  as  a 
limit  of  polynomials  (called  a  power  series)  or  as  a  limit  of  sums  of  sines  and 
cosines  (called  a  trigonometric  or  Fourier  series).  A  typical  question  for  Cauchy 
and  his  contemporaries  was  whether  the  continuity  of  the  limiting  polynomials 
or  trigonometric  functions  necessarily  implied  that  the  limit  /  would  also  be 
continuous. 

Sequences  and  series  of  functions  are  the  topics  of  Chapter  6.  What  is 
relevant  at  this  moment  is  that  we  realize  why  the  issue  of  finding  a  rigorous 
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definition  for  continuity  finally  made  its  way  to  the  fore.  Any  significant  progress 
on  the  question  of  whether  the  limit  of  continuous  functions  is  continuous 
(for  Cauchy  and  for  us)  necessarily  depends  on  a  definition  of  continuity  that 
does  not  rely  on  imprecise  notions  such  as  “no  holes”  or  “gaps.”  With  a  math¬ 
ematically  unambiguous  definition  for  the  limit  of  a  sequence  in  hand,  we  are 
well  on  our  way  toward  a  rigorous  understanding  of  continuity. 

Given  a  function  /  with  domain  A  C  R,  we  want  to  define  continuity  at  a 
point  c  G  A  to  mean  that  if  x  E  A  is  chosen  near  c,  then  f(x)  will  be  near  /(c). 
Symbolically,  we  will  say  /  is  continuous  at  c  if 

lim  f(x)  =  /(c). 

X^-C 

The  problem  is  that,  at  present,  we  only  have  a  definition  for  the  limit  of  a 
sequence,  and  it  is  not  entirely  clear  what  is  meant  by  lim x->cf(x).  The  sub¬ 
tleties  that  arise  as  we  try  to  fashion  such  a  definition  are  well-illustrated  via  a 
family  of  examples,  all  based  on  an  idea  of  the  prominent  German  mathemati¬ 
cian,  Peter  Lejeune  Dirichlet.  Dirichlet’s  idea  was  to  define  a  function  g  in  a 
piecewise  manner  based  on  whether  or  not  the  input  variable  x  is  rational  or 
irrational.  Specifically,  let 


n(X)  =  /  1  ^  *  G  Q 

9{  }  \  0  if  X  i  Q. 

The  intricate  way  that  Q  and  I  fit  inside  of  R  makes  an  accurate  graph  of  g 
technically  impossible  to  draw,  but  Figure  4.1  illustrates  the  basic  idea. 

Does  it  make  sense  to  attach  a  value  to  the  expression  lima._)>1/2  g(x)7  One 
idea  is  to  consider  a  sequence  (xn)  1/2.  Using  our  notion  of  the  limit  of 
a  sequence,  we  might  try  to  define  lim^^i/2  g(x)  as  simply  the  limit  of  the 
sequence  g(xn).  But  notice  that  this  limit  depends  on  how  the  sequence  (xn)  is 
chosen.  If  each  xn  is  rational,  then 

lim  g(xn)  =  1. 

n— OO 

On  the  other  hand,  if  xn  is  irrational  for  each  n,  then 

lim  g(xn)  =  0. 

n— OO 
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This  unacceptable  situation  demands  that  we  work  harder  on  our  definition  of 
functional  limits.  Generally  speaking,  we  want  the  value  of  lim X^cg(x)  to  be 
independent  of  how  we  approach  c.  In  this  particular  case,  the  definition  of  a 
functional  limit  that  we  agree  on  should  lead  to  the  conclusion  that 


lim 

x^l/2 


does  not  exist. 


Postponing  the  search  for  formal  definitions  for  the  moment,  we  should 
nonetheless  realize  that  Dirichlet’s  function  is  not  continuous  at  c—  1/2.  In  fact, 
the  real  significance  of  this  function  is  that  there  is  nothing  unique  about  the 
point  c  =  1/2.  Because  both  Q  and  I  (the  set  of  irrationals)  are  dense  in  the 
real  line,  it  follows  that  for  any  z  E  R  we  can  find  sequences  (xn)  C  Q  and 
(yn)  C  I  such  that 

lim  xn  =  lim  yn  =  z. 

(See  Example  3.2.9  (hi).)  Because 

lim  g{xn)  ^  lim  g(yn), 

the  same  line  of  reasoning  reveals  that  g{pc)  is  not  continuous  at  z.  In  the  jargon 
of  analysis,  Dirichlet’s  function  is  a  nowhere- continuous  function  on  R. 

What  happens  if  we  adjust  the  definition  of  g(x)  in  the  following  way?  Define 
a  new  function  h  (Fig.  4.2)  on  R  by  setting 

h(x)  =  {  X  if  X  e  Q 
{  ’  \  0  if  X  <£  Q. 

If  we  take  c  different  from  zero,  then  just  as  before  we  can  construct  sequences 
(xn)  —>  c  of  rationals  and  (yn)  — ?►  c  of  irrationals  so  that 

lim  h(xn)  =  c  and  lim  h(yn)  =  0. 

Thus,  h  is  not  continuous  at  every  point  c  /  0. 
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If  c  =  0,  however,  then  these  two  limits  are  both  equal  to  h( 0)  =  0.  In  fact, 
it  appears  as  though  no  matter  how  we  construct  a  sequence  (zn)  converging  to 
zero,  it  will  always  be  the  case  that  lim  h(zn)  =  0.  This  observation  goes  to  the 
heart  of  what  we  want  functional  limits  to  entail.  To  assert  that 

lim  h(x)  =  L 

x^-c 


should  imply  that 


h(zn)  L  for  all  sequences  (zn)  — )►  c. 

For  reasons  not  yet  apparent,  it  is  beneficial  to  fashion  the  definition  for  func¬ 
tional  limits  in  terms  of  neighborhoods  constructed  around  c  and  L.  We  will 
quickly  see,  however,  that  this  topological  formulation  is  equivalent  to  the 
sequential  characterization  we  have  arrived  at  here. 

To  this  point,  we  have  been  discussing  continuity  of  a  function  at  a  particular 
point  in  its  domain.  This  is  a  significant  departure  from  thinking  of  continuous 
functions  as  curves  that  can  be  drawn  without  lifting  the  pen  from  the  paper, 
and  it  leads  to  some  fascinating  questions.  In  1875,  K.J.  Thomae  discovered  the 
function 


I  1  if  x  =  0 

t(x)  =  <  1/n  if  x  =  m/n  Q\{0}  is  in  lowest  terms  with  n  >  0 

[  0  if  x  ^  Q. 

If  c  G  Q,  then  t(c)  >  0.  Because  the  set  of  irrationals  is  dense  in  R,  we  can  find 

a  sequence  (yn)  in  I  converging  to  c.  The  result  is  that 

lim  t(yn)  =  0^  t(c), 

and  Thomae’s  function  (Fig.  4.3)  fails  to  be  continuous  at  any  rational  point. 

The  twist  comes  when  we  try  this  argument  on  some  irrational  point  in  the 
domain  such  as  c  =  y/2.  All  irrational  values  get  mapped  to  zero  by  t,  so  the 
natural  thing  would  be  to  consider  a  sequence  (xn)  of  rational  numbers  that 
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converges  to  y/2.  Now,  \/2  ~  1.414213...,  so  a  good  start  on  a  particular 
sequence  of  rational  approximations  for  y/2  might  be 

/  14  141  1414  14142  141421  \ 

V  ’  To’  Too’  iooo’  ioooo’  iooooo’ ' '  j  ' 

But  notice  that  the  denominators  of  these  fractions  are  getting  larger.  In  this 
case,  the  sequence  t(xn)  begins, 

/  i  i  i  i  1  \ 

V’  5’  Too’  500’  5000’  100000’ '  ■  7 

and  is  fast  approaching  0  =  t{\/\ 2).  We  will  see  that  this  always  happens. 
The  closer  a  rational  number  is  chosen  to  a  fixed  irrational  number,  the  larger 
its  denominator  must  necessarily  be.  As  a  consequence,  Thomae’s  function  has 
the  bizarre  property  of  being  continuous  at  every  irrational  point  on  R  and 
discontinuous  at  every  rational  point. 

Is  there  an  example  of  a  function  with  the  opposite  property?  In  other  words, 
does  there  exist  a  function  defined  on  all  of  R  that  is  continuous  on  Q  but  fails 
to  be  continuous  on  I?  Can  the  set  of  discontinuities  of  a  particular  function  be 
arbitrary?  If  we  are  given  some  set  A  C  R,  is  it  always  possible  to  find  a  function 
that  is  continuous  only  on  the  set  Ac?  In  each  of  the  examples  in  this  section,  the 
functions  were  defined  to  have  erratic  oscillations  around  points  in  the  domain. 
What  conclusions  can  we  draw  if  we  restrict  our  attention  to  functions  that 
are  somewhat  less  volatile?  One  such  class  is  the  set  of  so-called  monotone 
functions,  which  are  either  increasing  or  decreasing  on  a  given  domain.  What 
might  we  be  able  to  say  about  the  set  of  discontinuities  of  a  monotone  function 
on  R? 


4.2  Functional  Limits 

Consider  a  function  f  :  A  R.  Recall  that  a  limit  point  c  of  A  is  a  point  with 
the  property  that  every  e-neighborhood  Ve(c)  intersects  A  in  some  point  other 
than  c.  Equivalently,  c  is  a  limit  point  of  A  if  and  only  if  c  =  limxn  for  some 
sequence  (xn)  C  A  with  xn  ^  c.  It  is  important  to  remember  that  limit  points 
of  A  do  not  necessarily  belong  to  the  set  A  unless  A  is  closed. 

If  c  is  a  limit  point  of  the  domain  of  /,  then,  intuitively,  the  statement 

lim  f(x)  =  L 

X^rC 

is  intended  to  convey  that  values  of  f(x)  get  arbitrarily  close  to  L  as  x  is  chosen 
closer  and  closer  to  c.  The  issue  of  what  happens  when  x  =  c  is  irrelevant  from 
the  point  of  view  of  functional  limits.  In  fact,  c  need  not  even  be  in  the  domain 
of  /. 

The  structure  of  the  definition  of  functional  limits  follows  the  “challenge- 
response”  pattern  established  in  the  definition  for  the  limit  of  a  sequence.  Recall 
that  given  a  sequence  (an),  the  assertion  liman  =  L  implies  that  for  every 
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Ve(L) 


L  -|-  € 

L 

L-e 


c  —  S  c  c  +  <5 


Vs(c) 


Figure  4.4:  Definition  of  Functional  Limit. 


e-neighborhood  Ve(L)  centered  at  L,  there  is  a  point  in  the  sequence — call  it 
cl _/v — after  which  all  of  the  terms  an  fall  in  Ve(L).  Each  e-neighborhood  repre¬ 
sents  a  particular  challenge,  and  each  N  is  the  respective  response.  For  func¬ 
tional  limit  statements  such  as  linage  f(x)  =  L,  the  challenges  are  still  made  in 
the  form  of  an  arbitrary  e-neighborhood  around  L,  but  the  response  this  time 
is  a  ^-neighborhood  centered  at  c. 


Definition  4.2.1  (Functional  Limit).  Let  f  :  A  R,  and  let  c  be  a  limit 

point  of  the  domain  A.  We  say  that  liny v^cf{x)  =  L  provided  that,  for  all 
e  >  0,  there  exists  a  8  >  0  such  that  whenever  0  <  x  —  c\  <  S  (and  x  G  A)  it 
follows  that  |  f(x)  —  L\  <  e. 

This  is  often  referred  to  as  the  Ue-S  version”  of  the  definition  for  functional 
limits.  Recall  that  the  statement 


f(x)-L  |  <  e  is  equivalent  to  f(x)  G  Ve(L). 


Likewise,  the  statement 


x  —  c 


<  S  is  satisfied  if  and  only  if  x  G  Vs(c). 


The  additional  restriction  0  < 


x  —  c 


is  just  an  economical  way  of  saying  x  ^  c. 


Recasting  Definition  4.2.1  in  terms  of  neighborhoods — just  as  we  did  for  the 
definition  of  convergence  of  a  sequence  in  Section  2.2 — amounts  to  little  more 
than  a  change  of  notation,  but  it  does  help  emphasize  the  geometrical  nature  of 
what  is  happening  (Fig.  4.4). 


Definition  4. 2. IB  (Functional  Limit:  Topological  Version).  Let  c  be  a 

limit  point  of  the  domain  of  /  :  A  R.  We  say  lim X^cf{x)  =  L  provided 
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that,  for  every  e-neighborhood  Ve(L)  of  L,  there  exists  a  ^-neighborhood  Vs(c) 
around  c  with  the  property  that  for  all  x  G  Vs(c)  different  from  c  (with  x  G  A) 
it  follows  that  f(x)  G  Ve(L). 

The  parenthetical  reminder  “(x  G  A )”  present  in  both  versions  of  the  def¬ 
inition  is  included  to  ensure  that  x  is  an  allowable  input  for  the  function  in 
question.  When  no  confusion  is  likely,  we  may  omit  this  reminder  with  the 
understanding  that  the  appearance  of  f(x)  carries  with  it  the  implicit  assump¬ 
tion  that  x  is  in  the  domain  of  /.  On  a  related  note,  there  is  no  reason  to  discuss 
functional  limits  at  isolated  points  of  the  domain.  Thus,  functional  limits  will 
only  be  considered  as  x  tends  toward  a  limit  point  of  the  function’s  domain. 

Example  4.2.2.  (i)  To  familiarize  ourselves  with  Definition  4.2.1,  let’s  prove 

that  if  f(x)  =  3x  +  1,  then 


lim  f(x)  =  7. 

cc— 


Let  e  >  0.  Definition  4.2.1  requires  that  we  produce  a  <5  0  so  that 

0  <  \x  —  2\  <  5  leads  to  the  conclusion  \f(x)  —  7\  <  e.  Notice  that 


f(x)  -  7|  =  \(3x  +  1)  -  7|  =  |3x  -  6|  =  3 


x 


Thus,  if  we  choose  5  =  e/3,  then  0  <  \x  —  2|  <  <5  implies  |  f(x)  —  7  < 
3  (e/3)  =  e. 


(ii)  Let’s  show 


lim  g(x)  =  4, 


where  g(x)  =  x2 .  Given  an  arbitrary  e  >  0,  our  goal  this  time  is  to  make 
| g(x)  —  4 1  <  e  by  restricting  \x  —  2|  to  be  smaller  than  some  carefully 
chosen  5.  As  in  the  previous  problem,  a  little  algebra  reveals 


g(x)  —  4|  =  \x 2  —  4|  =  \x  +  2\\x  —  2 


We  can  make  \x  —  2|  as  small  as  we  like,  but  we  need  an  upper  bound  on 
x  +  2\  in  order  to  know  how  small  to  choose  S.  The  presence  of  the  variable 
x  causes  some  initial  confusion,  but  keep  in  mind  that  we  are  discussing 
the  limit  as  x  approaches  2.  If  we  agree  that  our  ^-neighborhood  around 
c  =  2  must  have  radius  no  bigger  than  5  =  1,  then  we  get  the  upper  bound 
x  +  2 1  <  1 3  +  2 1  =  5  for  all  x  G  Vs  (c) . 

Now,  choose  5  =  min{l,  e/5}.  If  0  <  \x  —  2\  <5,  then  it  follows  that 


x2  -4 


x  +  2 


x  —  2|  <  (5)-  =  e, 
5 


and  the  limit  is  proved. 
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Sequential  Criterion  for  Functional  Limits 

We  worked  very  hard  in  Chapter  2  to  derive  an  impressive  list  of  proper¬ 
ties  enjoyed  by  sequential  limits.  In  particular,  the  Algebraic  Limit  Theorem 
(Theorem  2.3.3)  and  the  Order  Limit  Theorem  (Theorem  2.3.4)  proved  invalu¬ 
able  in  a  large  number  of  the  arguments  that  followed.  Not  surprisingly,  we 
are  going  to  need  analogous  statements  for  functional  limits.  Although  it  is  not 
difficult  to  generate  independent  proofs  for  these  statements,  all  of  them  will 
follow  quite  naturally  from  their  sequential  analogs  once  we  derive  the  sequen¬ 
tial  criterion  for  functional  limits  motivated  in  the  opening  discussion  of  this 
chapter. 

Theorem  4.2.3  (Sequential  Criterion  for  Functional  Limits).  Given  a 
function  f  :  A  R  and  a  limit  point  c  of  A,  the  following  two  statements  are 
equivalent: 

(i)  lim  f(x)  =  L. 

X^rC 

(ii)  For  all  sequences  (xn)  C  A  satisfying  xnf^c  and  (xn)  c,  it  follows  that 

f  (%n)  L. 

Proof.  (=>)  Let’s  first  assume  that  lim X^cf(x)  =  L.  To  prove  (ii),  we  consider 
an  arbitrary  sequence  (xn),  which  converges  to  c  and  satisfies  xn  c.  Our  goal 
is  to  show  that  the  image  sequence  f(xn)  converges  to  L.  This  is  most  easily 
seen  using  the  topological  formulation  of  the  definition. 

Let  e  0.  Because  we  are  assuming  (i),  Definition  4.2. IB  implies  that 
there  exists  Vs(c)  with  the  property  that  all  x  G  Vs(c)  different  from  c  satisfy 
f(x)  G  Ve(L).  All  we  need  to  do  then  is  argue  that  our  particular  sequence  (xn) 
is  eventually  in  Vs(c).  But  we  are  assuming  that  (xn)  c.  This  implies  that 
there  exists  a  point  xn  after  which  xn  G  Vs(c).  It  follows  that  n  >  N  implies 
f(xn)  G  Ve(L),  as  desired. 

(<=)  For  this  implication  we  give  a  contrapositive  proof,  which  is  essentially 
a  proof  by  contradiction.  Thus,  we  assume  that  statement  (ii)  is  true,  and 
carefully  negate  statement  (i).  To  say  that 


lim  f(x)  ^  L 


X^rC 


means  that  there  exists  at  least  one  particular  eo  >  0  for  which  no  S  is  a  suitable 
response.  In  other  words,  no  matter  what  S  >  0  we  try,  there  will  always  be  at 
least  one  point 


x  G  Vs (c)  with  x/c  for  which  f(x)  ^  Veo(L). 

Now  consider  5n  =  1/n.  From  the  preceding  discussion,  it  follows  that  for  each 
n  G  N  we  may  pick  an  xn  G  Vsn(c )  with  xn  ^  c  and  f(xn)  £  Veo(L).  But  now 
notice  that  the  result  of  this  is  a  sequence  (xn)  —>  c  with  xn  c,  where  the 
image  sequence  f(xn)  certainly  does  not  converge  to  L. 

Because  this  contradicts  (ii),  which  we  are  assuming  is  true  for  this  argument, 
we  may  conclude  that  (i)  must  also  hold.  □ 
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Theorem  4.2.3  has  several  useful  corollaries.  In  addition  to  the  previously 
advertised  benefit  of  granting  us  some  short  proofs  of  statements  about  how 
functional  limits  interact  with  algebraic  combinations  of  functions,  we  also  get 
an  economical  way  of  establishing  that  certain  limits  do  not  exist. 


Corollary  4.2.4  (Algebraic  Limit  Theorem  for  Functional  Limits).  Let 

f  and  g  be  functions  defined  on  a  domain  A  C  R,  and  assume  limx^c  f(x)  =  L 
and  lim X^cg(x)  =  M  for  some  limit  point  c  of  A.  Then, 

(i)  lim  kf(x )  =  kL  for  all  k  E  R, 

x  — yc 


lim  [f(x)  +  g(x)\  =  L  +  M . 


X^fC 


(iii)  lim  [f(x)g{x)\  =  LM ,  and 


X^rC 


(iv)  lim  f(x)/g(x)  =  L/M,  provided  M  /  0. 

x^rc 

Proof.  These  follow  from  Theorem  4.2.3  and  the  Algebraic  Limit  Theorem  for 
sequences.  The  details  are  requested  in  Exercise  4.2.1.  □ 


Corollary  4.2.5  (Divergence  Criterion  for  Functional  Limits).  Let  f  be 

a  function  defined  on  A,  and  let  c  be  a  limit  point  of  A.  If  there  exist  two 
sequences  (xn)  and  (yn)  in  A  with  xn  ^  c  and  yn  ^  c  and 

lim  xn  =  lim  yn  =  c  but  lim  f(xn)  ^  lim  f{yn), 

then  we  can  conclude  that  the  functional  limit  limJ;^,(:  f(x)  does  not  exist. 

Example  4.2.6.  Assuming  the  familiar  properties  of  the  sine  function,  let’s 
show  that  linL^o  sin(l/x)  does  not  exist  (Fig.  4.5). 

If  xn  =  l/2n7r  and  yn  =  l/(2n7r  +  tt/2) ,  then  lim(xn)  =  lim (yn)  =  0. 
However,  sin(l/xn)  =  0  for  all  n  G  N  while  sin(l /yn)  =  1.  Thus, 


lim sin(l/ xn)  7^  lim sin(l/yn), 


so  by  Corollary  4.2.5,  lim^^o  sin(l/x)  does  not  exist. 


Figure  4.5:  The  function  sin(l/x)  near  zero. 
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Exercises 


Exercise  4.2.1.  (a)  Supply  the  details  for  how  Corollary  4.2.4  part  (ii)  follows 

from  the  Sequential  Criterion  for  Functional  Limits  in  Theorem  4.2.3  and 
the  Algebraic  Limit  Theorem  for  sequences  proved  in  Chapter  2. 

(b)  Now,  write  another  proof  of  Corollary  4.2.4  part  (ii)  directly  from  Defini¬ 
tion  4.2.1  without  using  the  sequential  criterion  in  Theorem  4.2.3. 

(c)  Repeat  (a)  and  (b)  for  Corollary  4.2.4  part  (iii). 

Exercise  4.2.2.  For  each  stated  limit,  find  the  largest  possible  ^-neighborhood 
that  is  a  proper  response  to  the  given  e  challenge. 

(a)  lim;r^3(5x  —  6)  =  9,  where  e  =  1. 

(b)  yT  =  2,  where  e  =  1. 

(c)  lima,^7r[[x]]  =  3,  where  e  =  1.  (The  function  [[#]]  returns  the  greatest 
integer  less  than  or  equal  to  r.) 


(d)  lim 


x  — Ytt 


X 


=  3,  where  e  =  .01 


Exercise  4.2.3.  Review  the  definition  of  Thomae’s  function  t(x)  from 
Section  4.1. 


(a)  Construct  three  different  sequences  (xn),  ( yn ),  and  ( zn ),  each  of  which 
converges  to  1  without  using  the  number  1  as  a  term  in  the  sequence. 

(b)  Now,  compute  lim t(xn),  lim  t(yn),  and  lim t(zn). 

(c)  Make  an  educated  conjecture  for  lmr^i  t(x),  and  use  Definition  4.2. IB  to 
verify  the  claim.  (Given  e  >  o,  consider  the  set  of  points  {xGR:  t(x)  >  e}. 
Argue  that  all  the  points  in  this  set  are  isolated.) 

Exercise  4.2.4.  Consider  the  reasonable  but  erroneous  claim  that 


lim  l/\\x 

x^lO 


(a)  Find  the  largest  S  that  represents  a  proper  response  to  the  challenge  of 

e  =  1/2. 

(b)  Find  the  largest  S  that  represents  a  proper  response  to  e  =  1/50. 

(c)  Find  the  largest  e  challenge  for  which  there  is  no  suitable  S  response 
possible. 
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Exercise  4.2.5.  Use  Definition  4.2.1  to  supply  a  proper  proof  for  the  following 
limit  statements. 


(a)  limx^2(3x  +  4)  =  10. 

(b)  lim^o  x3  =  0. 

(c)  limx^2  (x2  +  x  -  1)  =  5. 

(d)  lima;_i.3  1/x  =  1/3. 

Exercise  4.2.6.  Decide  if  the  following  claims  are  true  or  false,  and  give  short 
justifications  for  each  conclusion. 

(a)  If  a  particular  S  has  been  constructed  as  a  suitable  response  to  a  particular 
e  challenge,  then  any  smaller  positive  S  will  also  suffice. 

(b)  If  lirm^a  f(x)  =  L  and  a  happens  to  be  in  the  domain  of  /,  then  L  =  /(a). 

(c)  If  limx^a  f(x)  =  L,  then  limx^a  3 [f(x)  -  2]2  =  3 (L  -  2)2. 

(d)  If  lim x->af(x)  =  0,  then  lim x^a  f(x)g(x)  =  0  for  any  function  g  (with 
domain  equal  to  the  domain  of  /.) 


Exercise  4.2.7.  Let  g  :  A  — R  and  assume  that  /  is  a  bounded  function  on  A 
in  the  sense  that  there  exists  M  >  0  satisfying  |/(x)|  <  M  for  all  x  G  A. 

Show  that  if  lim X^cg(x)  =  0,  then  lim X~>cg(x)f(x)  =  0  as  well. 


Exercise  4.2.8.  Compute  each  limit  or  state  that  it  does  not  exist.  Use  the 
tools  developed  in  this  section  to  justify  each  conclusion. 

(a)  limx_>2 

(b)  limx_).7/4 


(c)  limx^o(-l)111/;r11 

(d)  limx_s.o  vV(-l)^/x^ 


Exercise  4.2.9  (Infinite  Limits).  The  statement  limx^o  1/x2  =  oo  certainly 
makes  intuitive  sense.  To  construct  a  rigorous  definition  in  the  challenge- 
response  style  of  Definition  4.2.1  for  an  infinite  limit  statement  of  this  form, 
we  replace  the  (arbitrarily  small)  e  >  0  challenge  with  an  (arbitrarily  large) 
M  >  0  challenge: 

Definition:  lim .x->cf(x)  =  oo  means  that  for  all  M  >  0  we  can  find  a  S  >  0 


such  that  whenever  0  < 


\x  —  c 


<  S,  it  follows  that  f(x)  >  M . 


(a)  Show  lim^^o  1/^r2 


oo  in  the  sense  described  in  the  previous  definition. 


(b)  Now,  construct  a  definition  for  the  statement  lim x^oof{x)  =  L.  Show 
lim^oo  1/x  =  0. 
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(c)  What  would  a  rigorous  definition  for  lim^^oo  f(x)  =  oo  look  like?  Give 
an  example  of  such  a  limit. 


Exercise  4.2.10  (Right  and  Left  Limits).  Introductory  calculus  courses 
typically  refer  to  the  right-hand  limit  of  a  function  as  the  limit  obtained  by 
“letting  x  approach  a  from  the  right-hand  side.” 

(a)  Give  a  proper  definition  in  the  style  of  Definition  4.2.1  for  the  right-hand 
and  left-hand  limit  statements: 


lim  f{pc)  =  L  and  lim  f(x)  =  M. 

cc— x^a~ 


(b)  Prove  that  lim X^af(x)  =  L  if  and  only  if  both  the  right  and  left-hand 
limits  equal  L. 

Exercise  4.2.11  (Squeeze  Theorem).  Let  /, g,  and  h  satisfy  f(x)  <  g{x)  < 
h(x)  for  all  x  in  some  common  domain  A.  If  lim^^c  f(x)  =  L  and  lim^^c  h(x)  = 
L  at  some  limit  point  c  of  A ,  show  lim x->cg(x)  =  L  as  well. 


4.3  Continuous  Functions 


We  now  come  to  a  significant  milestone  in  our  progress  toward  a  rigorous  theory 
of  real- valued  functions — a  proper  definition  of  the  seminal  concept  of  continuity 
that  avoids  any  intuitive  appeals  to  “unbroken  curves”  or  functions  without 
“jumps”  or  “holes.” 


Definition  4.3.1  (Continuity).  A  function  f  :  A  R  is  continuous  at  a 
point  c  G  A  if,  for  all  e  >  0,  there  exists  a  S  >  0  such  that  whenever  \x  —  c\  <  S 
(and  x  G  A)  it  follows  that  |  f(x)  —  /(c)  |  <  e. 

If  /  is  continuous  at  every  point  in  the  domain  A,  then  we  say  that  /  is 
continuous  on  A. 


The  definition  of  continuity  looks  much  like  the  definition  for  functional 
limits,  with  a  few  subtle  differences.  The  most  important  is  that  we  require  the 
point  c  to  be  in  the  domain  of  /.  The  value  /(c)  then  becomes  the  value  of 
lim X~>cf(x).  With  this  observation  in  mind,  it  is  tempting  to  shorten  Defini¬ 
tion  4.3.1  to  say  that  /  is  continuous  at  c  E  A  if 

lim  f(x)  =  f(c). 

X^tC 

This  is  fine  as  long  as  c  is  a  limit  point  of  A.  If  c  is  an  isolated  point  of  A , 
then  lim x->cf(x)  isn’t  defined  but  Definition  4.3.1  can  still  be  applied.  An  un¬ 
remarkable  but  noteworthy  consequence  of  this  definition  is  that  functions  are 
continuous  at  isolated  points  of  their  domains  (Exercise  4.3.5). 

We  saw  in  the  previous  section  that,  in  addition  to  the  standard  e-S  definition, 
functional  limits  have  a  useful  formulation  in  terms  of  sequences.  The  same  is 
true  of  continuity.  The  next  theorem  summarizes  these  various  equivalent  ways 
to  characterize  the  continuity  of  a  function  at  a  given  point. 
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Theorem  4.3.2  (Characterizations  of  Continuity).  Let  f  :  A  R,  and  let 

c  G  A.  The  function  f  is  continuous  at  c  if  and  only  if  any  one  of  the  following 
three  conditions  is  met: 


(i)  For  all  e  >  0,  there  exists  a  S  >  0  such  that 
1/0*0  -  /(c)  |  <  e; 


x  —  c 


<  S  ( and  x  G  A)  implies 


(ii)  For  all  Ve(f(c)),  there  exists  a  Vs(c)  with  the  property  that  x  G  Vs(c)  ( and 
x  G  A)  implies  f(x)  G  Ve(f(c)); 


(iii)  For  all  (xn)  c  ( with  xn  G  A),  it  follows  that  f(xn)  /(c) 


If  c  is  a  limit  point  of  A,  then  the  above  conditions  are  equivalent  to 
(iv)  lim  f(x)  =  f(c). 

X^tC 

Proof.  Statement  (i)  is  just  Definition  4.3.1,  and  statement  (ii)  is  the  standard 
rewording  of  (i)  using  topological  neighborhoods  in  place  of  the  absolute  value 
notation.  Statement  (iii)  is  equivalent  to  (i)  via  an  argument  nearly  identical  to 
that  of  Theorem  4.2.3,  with  some  slight  modifications  for  when  xn  =  c.  Finally, 
statement  (iv)  is  seen  to  be  equivalent  to  (i)  by  considering  Definition  4.2.1  and 
observing  that  the  case  x  =  c  (which  is  excluded  in  the  definition  of  functional 
limits)  leads  to  the  requirement  /(c)  G  Ve(f(c)),  which  is  trivially  true.  □ 


The  length  of  this  list  is  somewhat  deceiving.  Statements  (i),  (ii),  and  (iv) 
are  closely  related  and  essentially  remind  us  that  functional  limits  have  an  e-S 
formulation  as  well  as  a  topological  description.  Statement  (iii),  however,  is 
qualitatively  different  from  the  others.  As  a  general  rule,  the  sequential  char¬ 
acterization  of  continuity  is  typically  the  most  useful  for  demonstrating  that  a 
function  is  not  continuous  at  some  point. 

Corollary  4.3.3  (Criterion  for  Discontinuity).  Let  f  :  A  R,  and  let 

c  G  A  be  a  limit  point  of  A.  If  there  exists  a  sequence  (xn)  C  A  where  (xn)  —>  c 
but  such  that  f(xn)  does  not  converge  to  /(c),  we  may  conclude  that  f  is  not 
continuous  at  c. 


The  sequential  characterization  of  continuity  is  also  important  for  the  other 
reasons  that  it  was  important  for  functional  limits.  In  particular,  it  allows 
us  to  bring  our  catalog  of  results  about  the  behavior  of  sequences  to  bear  on 
the  study  of  continuous  functions.  The  next  theorem  should  be  compared  to 
Corollary  4.2.3  as  well  as  to  Theorem  2.3.3. 

Theorem  4.3.4  (Algebraic  Continuity  Theorem).  Assume  f  :  A  R  and 

g:A^  R  are  continuous  at  a  point  c  G  A.  Then, 

(i)  kf(x)  is  continuous  at  c  for  all  k  G  R; 

(ii)  f(x)  +  g{x)  is  continuous  at  c ; 

(iii)  f(x)g(x)  is  continuous  at  c;  and 

(iv)  f(x)/g{x)  is  continuous  at  c,  provided  the  quotient  is  defined. 
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Figure  4.6:  The  function  xsm(l/x)  near  zero, 


Proof.  All  of  these  statements  can  be  quickly  derived  from  Corollary  4.2.4  and 
Theorem  4.3.2.  □ 

These  results  provide  us  with  the  tools  we  need  to  firm  up  our  arguments 
in  the  opening  section  of  this  chapter  about  the  behavior  of  Dirichlet’s  function 
and  Thomae’s  function.  The  details  are  requested  in  Exercise  4.3.7.  Here  are 
some  more  examples  of  arguments  for  and  against  continuity  of  some  familiar 
functions. 

Example  4.3.5.  All  polynomials  are  continuous  on  R.  In  fact,  rational  func¬ 
tions  (i.e.,  quotients  of  polynomials)  are  continuous  wherever  they  are  defined. 

To  see  why  this  is  so,  consider  the  identity  function  g(x)  =  x.  Because 
| g(x)  —  g(c)  |  =  \x  —  c|,  we  can  respond  to  a  given  e  >  0  by  choosing  <5  =  e, 
and  it  follows  that  g  is  continuous  on  all  of  R.  It  is  even  simpler  to  show  that 
a  constant  function  f(x)  =  /c,  is  continuous.  (Letting  5  =  1  regardless  of  the 
value  of  e  does  the  trick.)  Because  an  arbitrary  polynomial 


p(x)  =  ao  +  a\x  +  CL2X2  +  •  •  •  +  anxn 

consists  of  sums  and  products  of  g{x)  with  different  constant  functions,  we  may 
conclude  from  Theorem  4.3.4  that  p(x)  is  continuous. 

Likewise,  Theorem  4.3.4  implies  that  quotients  of  polynomials  are  continuous 
as  long  as  the  denominator  is  not  zero. 

Example  4.3.6.  In  Example  4.2.6,  we  saw  that  the  oscillations  of  sin(l/x)  are 
so  rapid  near  the  origin  that  lim^^o  sin(l/x)  does  not  exist.  Now,  consider  the 
function 

x  sin(l/x)  if  x  ^  0 
0  if  x  =  0. 


g(x)  = 


To  investigate  the  continuity  of  g  at  c  =  0  (Fig.  4.6),  we  can  estimate 


g{x)  —  g(0)|  =  |xsin(l/x)  —  0|  < 


x 
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Given  e  >  0,  set  S  =  e,  so  that  whenever  x  —  0 
| g{x)  —  g{ 0)|  <  e.  Thus,  g  is  continuous  at  the  origin. 


<  S  it  follows  that 


Example  4.3.7.  Throughout  the  exercises  we  have  been  using  the  greatest 
integer  function  h(x)  =  [[x]}  which  for  each  x  G  R  returns  the  largest  integer 
n  G  Z  satisfying  n  <  x.  This  familiar  step  function  certainly  has  discontinuous 
“jumps”  at  each  integer  value  of  its  domain,  but  it  is  a  useful  exercise  to  try 
and  articulate  this  observation  in  the  language  of  analysis. 

Given  m  G  Z,  define  the  sequence  (xn)  by  xn  =  m  —  1/n.  It  follows  that 
(xn)  —>  rn.  but 

h(xn)  ->  (rn  -  1), 

which  does  not  equal  m  =  h(m).  By  Corollary  4.3.3,  we  see  that  h  fails  to  be 
continuous  at  each  m  G  Z. 

Now  let’s  see  why  h  is  continuous  at  a  point  c  ^  Z.  Given  e  >  0,  we  must  find 
a  ^-neighborhood  Vs(c)  such  that  x  G  V$(c)  implies  h(x)  G  Ve(h(c)).  We  know 
that  c  G  R  falls  between  consecutive  integers  n  <  c  <  n  +  1  for  some  n  G  Z. 
If  we  take  S  =  min{c  —  n,  (n  +  1)  —  c},  then  it  follows  from  the  definition  of  h 
that  h(x)  =  h(c)  for  all  x  G  V$(c).  Thus,  we  certainly  have 


h{pc)  G  Ve(h(c)) 


whenever  x  G  Vs(c). 

This  latter  proof  is  quite  different  from  the  typical  situation  in  that  the  value 
of  S  does  not  actually  depend  on  the  choice  of  e.  Usually,  a  smaller  e  requires  a 
smaller  S  in  response,  but  here  the  same  value  of  S  works  no  matter  how  small 
e  is  chosen. 


Example  4.3.8.  Consider  f(x)  =  yT  defined  on  A  =  {x  G  R  :  x  >  0}. 
Exercise  2.3.1  outlines  a  sequential  proof  that  /  is  continuous  on  A.  Here,  we 
give  an  e-S  proof  of  the  same  fact. 

Let  e  >  0.  We  need  to  argue  that  |  f{x)  —  /(c)  |  can  be  made  less  than  e  for 
all  values  of  x  in  some  S  neighborhood  around  c.  If  c  =  0,  this  reduces  to  the 
statement  yjx  <  e,  which  happens  as  long  as  x  <  e2.  Thus,  if  we  choose  S  =  e2, 
we  see  that  \x  —  0|  <  S  implies  \f(x)  —  0|  <  e. 

For  a  point  c  G  A  different  from  zero,  we  need  to  estimate  |yT  —  y/c\.  This 
time,  write 


In  order  to  make  this  quantity  less  than  e,  it  suffices  to  pick  S  =  eyT.  Then, 


x  —  c 


<  S  implies 


X  -  y/l\  <  e-A  =  e. 
VC 


as  desired. 
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Although  we  have  now  shown  that  both  polynomials  and  the  square  root 
function  are  continuous,  the  Algebraic  Continuity  Theorem  does  not  provide 
the  justification  needed  to  conclude  that  a  function  such  as  h(x)  =  a/3x2  +  5  is 
continuous.  For  this,  we  must  prove  that  compositions  of  continuous  functions 
are  continuous. 

Theorem  4.3.9  (Composition  of  Continuous  Functions).  Given  f  :  R 

and  g  :  B  ^  R,  assume  that  the  range  f(A)  =  {f(x)  :  x  E  A}  is  contained  in 
the  domain  B  so  that  the  composition  g  o  f(x)  =  g(f(x))  is  defined  on  A. 

If  f  is  continuous  at  c  E  A,  and  if  g  is  continuous  at  /(c)  E  B ,  then  g  o  f  is 
continuous  at  c. 

Proof.  Exercise  4.3.3.  □ 


Exercises 


Exercise  4.3.1.  Let  g(x)  =  yfx. 

(a)  Prove  that  g  is  continuous  at  c  =  0. 

(b)  Prove  that  g  is  continuous  at  a  point  c  /  0.  (The  identity  a3  —  b3  = 
(a  —  b) (a2  +  ab  +  b2)  will  be  helpful.) 

Exercise  4.3.2.  To  gain  a  deeper  understanding  of  the  relationship  between 
e  and  S  in  the  definition  of  continuity,  let’s  explore  some  modest  variations  of 
Definition  4.3.1.  In  all  of  these,  let  /  be  a  function  defined  on  all  of  R. 


(a)  Let’s  say  /  is  onetinuous  at  c  if  for  all  e  >  0  we  can  choose  4=1  and  it 


follows  that  |  f(x)  —  /(c)  |  <  e  whenever 
function  that  is  onetinuous  on  all  of  R. 


x 


c 


<  4.  Find  an  example  of  a 


(b)  Let’s  say  /  is  equaltinuous  at  c  if  for  all  e  >  0  we  can  choose  4  =  e  and  it 

<  4.  Find  an  example  of  a 


x 


c 


follows  that  |  f{x)  —  /(c)  |  <  e  whenever 
function  that  is  equaltinuous  on  R  that  is  nowhere  onetinuous,  or  explain 
why  there  is  no  such  function. 

(c)  Let’s  say  /  is  lesstinuous  at  c  if  for  all  e  >  0  we  can  choose  0  <  4  <  e  and 

<  4.  Find  an  example  of  a 


it  follows  that  \f(x)  —  /(c) |  <  e  whenever  | x  —  c 
function  that  is  lesstinuous  on  R  that  is  nowhere  equaltinuous,  or  explain 
why  there  is  no  such  function. 

(d)  Is  every  lesstinuous  function  continuous?  Is  every  continuous  function 
lesstinuous?  Explain. 


Exercise  4.3.3.  (a)  Supply  a  proof  for  Theorem  4.3.9  using  the  e-4  charac¬ 

terization  of  continuity. 


(b)  Give  another  proof  of  this  theorem  using  the  sequential  characterization 
of  continuity  (from  Theorem  4.3.2  (iii)). 
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Exercise  4.3.4.  Assume  /  and  g  are  defined  on  all  of  R  and  that  lim  f(x)  =  q 

x  — >•  p 

and  lim  g(x)  =  r. 

x^-q 


(a)  Give  an  example  to  show  that  it  may  not  be  true  that 

lim  g(f(x))  =  r. 

x^-p 


(b)  Show  that  the  result  in  (a)  does  follow  if  we  assume  /  and  g  are  continuous. 

(c)  Does  the  result  in  (a)  hold  if  we  only  assume  /  is  continuous?  How  about 
if  we  only  assume  that  g  is  continuous? 

Exercise  4.3.5.  Show  using  Definition  4.3.1  that  if  c  is  an  isolated  point  of 
4CR,  then  f  :  A  R  is  continuous  at  c. 


Exercise  4.3.6.  Provide  an  example  of  each  or  explain  why  the  request  is 
impossible. 


(a)  Two  functions  /  and  g,  neither  of  which  is  continuous  at  0  but  such  that 
f(x)g(x)  and  f(x)  +  g{x)  are  continuous  at  0. 

(b)  A  function  f(x)  continuous  at  0  and  g{x)  not  continuous  at  0  such  that 
f(x)  +  g{x)  is  continuous  at  0. 


(c)  A  function  f(x)  continuous  at  0  and  g(x)  not  continuous  at  0  such  that 
f{x)g{x)  is  continuous  at  0. 

(d)  A  function  f(x)  not  continuous  at  0  such  that  f(x)  +  is  continuous 
at  0. 


A  function 


f(x)  not  continuous  at  0  such  that  [f(x) 


3 


is  continuous  at  0. 


Exercise  4.3.7.  (a)  Referring  to  the  proper  theorems,  give  a  formal  argu¬ 

ment  that  Dirichlet’s  function  from  Section  4.1  is  nowhere-continuous 
on  R. 


(b)  Review  the  definition  of  Thomae’s  function  in  Section  4.1  and  demonstrate 
that  it  fails  to  be  continuous  at  every  rational  point. 


(c)  Use  the  characterization  of  continuity  in  Theorem  4.3.2  (iii)  to  show  that 
Thomae’s  function  is  continuous  at  every  irrational  point  in  R.  (Given 
e  >  0,  consider  the  set  of  points  {x  E  R  :  t(x)  >  e}.) 


Exercise  4.3.8.  Decide  if  the  following  claims  are  true  or  false,  providing  either 
a  short  proof  or  counterexample  to  justify  each  conclusion.  Assume  throughout 
that  g  is  defined  and  continuous  on  all  of  R. 


(a)  If  g{x)  >  0  for  all  x  <  1,  then  g(  1)  >  0  as  well. 

(b)  If  g(r)  =  0  for  all  r  G  Q,  then  g(x)  =  0  for  all  x  G  R. 
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(c)  If  g(x o)  >  0  for  a  single  point  xq  G  R,  then  g(x)  is  in  fact  strictly  positive 
for  uncountably  many  points. 

Exercise  4.3.9.  Assume  h  :  R  — >  R  is  continuous  on  R  and  let  K  =  {x  : 
h(x)  =  0}.  Show  that  if  is  a  closed  set. 

Exercise  4.3.10.  Observe  that  if  a  and  b  are  real  numbers,  then 


max{a,  b} 


1 

2 


(a  +  b)  + 


a 


(a)  Show  that  if  /i,  /2,  •  •  • ,  fn  are  continuous  functions,  then 

g(x)  =  max{/i(x),  ,  /„(*)} 


is  a  continuous  function. 


(b)  Let’s  explore  whether  the  result  in  (a)  extends  to  the  infinite  case.  For 
each  n  G  N,  define  fn  on  R  by 


f  1 

if 

X 

|  n 

X 

if 

X 

>  1/n 
<  1/n. 


Now  explicitly  compute  h(x)  =  sup{/i(x),  feipc),  fs(x),. . 

Exercise  4.3.11  (Contraction  Mapping  Theorem).  Let  /  be  a  function 
defined  on  all  of  R,  and  assume  there  is  a  constant  c  such  that  0  <  c  <  1  and 


fix)  -  f(y)  I  <  c 


X 


y 


for  all  x,  y  E  R. 

(a)  Show  that  /  is  continuous  on  R. 

(b)  Pick  some  point  y i  E  R  and  construct  the  sequence 


In  general,  if  yn+ i  =  f{yn),  show  that  the  resulting  sequence  (yn)  is  a 
Cauchy  sequence.  Hence  we  may  let  y  =  lim  yn. 

(c)  Prove  that  y  is  a  fixed  point  of  /  (i.e.,  f(y)  =  y)  and  that  it  is  unique  in 
this  regard. 


(d)  Finally,  prove  that  if  x  is  any  arbitrary  point  in  R,  then  the  sequence 
(x,  /(#),  /(/(#)), . . .)  converges  to  y  defined  in  (b). 

Exercise  4.3.12.  Let  F  C  R  be  a  nonempty  closed  set  and  define  g(x)  = 


inf{|x  —  a 
all  x  ^  F. 


:  a  G  F1}.  Show  that  g  is  continuous  on  all  of  R  and  g(x)  ^  0  for 


Exercise  4.3.13.  Let  /  be  a  function  defined  on  all  of  R  that  satisfies  the 
additive  condition  f(x  +  y)  =  f(x)  +  f(y)  for  all  x,  y  G  R. 
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(a)  Show  that  /( 0)  =  0  and  that  f(—x)  =  —f(x)  for  all  x  G  R. 

(b)  Let  k  =  /( 1).  Show  that  f(n)  =  kn  for  all  n  G  N,  and  then  prove  that 
f(z )  =  kz  for  all  z  G  Z.  Now,  prove  that  /(r)  =  kr  for  any  rational 
number  r. 

(c)  Show  that  if  /  is  continuous  at  x  =  0,  then  /  is  continuous  at  every  point 
in  R  and  conclude  that  f(x)  =  kx  for  all  x  G  R.  Thus,  any  additive 
function  that  is  continuous  at  x  =  0  must  necessarily  be  a  linear  function 
through  the  origin. 

Exercise  4.3.14.  (a)  Let  F  be  a  closed  set.  Construct  a  function  /  :  R  — »•  R 

such  that  the  set  of  points  where  /  fails  to  be  continuous  is  precisely  F. 
(The  concept  of  the  interior  of  a  set,  discussed  in  Exercise  3.2.14,  may  be 
useful.) 

(b)  Now  consider  an  open  set  O.  Construct  a  function  g  :  R  R  whose  set 
of  discontinuous  points  is  precisely  O.  (For  this  problem,  the  function  in 
Exercise  4.3.12  may  be  useful.) 

4.4  Continuous  Functions  on  Compact  Sets 

Given  a  function  f  :  A  R  and  a  subset  B  C  A,  the  notation  f(B)  refers  to 
the  range  of  /  over  the  set  B;  that  is, 

f(B)  =  {/(a 0  :  x  €  B}. 

The  adjectives  open,  closed,  bounded,  compact,  perfect,  and  connected  are 
all  used  to  describe  subsets  of  the  real  line.  An  interesting  question  is  to  sort 
out  which,  if  any,  of  these  properties  are  preserved  when  a  particular  set  B  is 
mapped  to  f(B)  via  a  continuous  function.  For  instance,  if  B  is  open  and  / 
is  continuous,  is  f(B)  necessarily  open?  The  answer  to  this  question  is  no.  If 
f(x)  =  x2  and  B  is  the  open  interval  (—1,1),  then  f(B)  is  the  interval  [0, 1), 
which  is  not  open. 

The  corresponding  conjecture  for  closed  sets  also  turns  out  to  be  false,  al¬ 
though  constructing  a  counterexample  requires  a  little  more  thought.  Consider 
the  function 

x  1 

X)  =  - V 

1  +  X2 

and  the  closed  set  B  =  [0,  oo)  =  {x  :  x  >  0}.  Because  g(B)  =  (0,1]  is  not 
closed,  we  must  conclude  that  continuous  functions  do  not,  in  general,  map 
closed  sets  to  closed  sets.  Notice,  however,  that  our  particular  counterexample 
required  using  an  unbounded  closed  set  B.  This  is  not  incidental.  Sets  that  are 
closed  and  bounded — that  is,  compact  sets — always  get  mapped  to  closed  and 
bounded  subsets  by  continuous  functions. 

Theorem  4.4.1  (Preservation  of  Compact  Sets).  Let  f  :  A  R  be  con¬ 
tinuous  on  A.  If  K  FA  is  compact,  then  f(K)  is  compact  as  well. 
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Proof.  Let  (yn)  be  an  arbitrary  sequence  contained  in  the  range  set  f(K). 
To  prove  this  result,  we  must  find  a  subsequence  (ynk),  which  converges  to 
a  limit  also  in  f(K).  The  strategy  is  to  take  advantage  of  the  assumption  that 
the  domain  set  K  is  compact  by  translating  the  sequence  (yn) — which  is  in  the 
range  of  / — back  to  a  sequence  in  the  domain  K. 

To  assert  that  (yn)  C  f(K)  means  that,  for  each  n  £  N,  we  can  find  (at  least 
one)  xn  £  K  with  f(xn)  =  yn •  This  yields  a  sequence  (xn)  C  K.  Because  K  is 
compact,  there  exists  a  convergent  subsequence  (xUk)  whose  limit  x  =  limxnk 
is  also  in  K.  Finally,  we  make  use  of  the  fact  that  /  is  assumed  to  be  continuous 
on  A  and  so  is  continuous  at  x  in  particular.  Given  that  (xnk)  -£  x,  we  conclude 
that  (ynk)  f(x )•  Because  x  £  K,  we  have  that  f(x)  £  /(iL),  and  hence  f(K ) 
is  compact.  □ 

An  extremely  important  corollary  is  obtained  by  combining  this  result  with 
the  observation  that  compact  sets  are  bounded  and  contain  their  supremums 
and  infimums. 

Theorem  4.4.2  (Extreme  Value  Theorem).  Iff:K  is  continuous  on 
a  compact  set  K  C  R,  then  f  attains  a  maximum  and  minimum  value.  In  other 
words,  there  exist  x^,x\  £  K  such  that  f(xo)  <  f(x)  <  f(x i)  for  all  x  £  K. 

Proof.  Because  f(K)  is  compact,  we  can  set  a  =  sup  f(K)  and  know  a  £  f(K) 
(Exercise  3.3.1).  It  follows  that  there  exist  x\  £  K  with  a  =  f(x\).  The 
argument  for  the  minimum  value  is  similar.  □ 


Uniform  Continuity 

Although  we  have  proved  that  polynomials  are  always  continuous  on  R,  there 
is  an  important  lesson  to  be  learned  by  constructing  direct  proofs  that  the 
functions  f(x)  =  3x  +  1  and  g(x)  =  x2  (previously  studied  in  Example  4.2.2) 
are  everywhere  continuous. 

Example  4.4.3.  (i)  To  show  directly  that  f(x)  =  3x  +  1  is  continuous  at 

an  arbitrary  point  c  £  R,  we  must  argue  that  |  f(x)  —  /(c)  |  can  be  made 
arbitrarily  small  for  values  of  x  near  c.  Now, 


f{x)  -  /(c)  I  =  |  (3a;  +  1)  -  (3c+  1)|  =  3 


x  —  c 


so 


,  given  e  >  0,  we  choose  5  =  e/3.  Then, 


x  —  c 


<  S  implies 


1/0*0  -  /(c)  I  =  3 


x  —  c 


<  3  (  -  )  =  e. 


Of  particular  importance  for  this  discussion  is  the  fact  that  the  choice  of 
<5  is  the  same  regardless  of  which  point  c  £  R  we  are  considering. 

(ii)  Let’s  contrast  this  with  what  happens  when  we  prove  g(x)  =  x2  is  contin¬ 
uous  on  R.  Given  c  £  R,  we  have 


\g(x)  -  g(c) 


2  2 
X  —  C 


x  —  c 


X  +  c 
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As  discussed  in  Example  4.2.2,  we  need  an  upper  bound  on  \x  +  c|,  which 
is  obtained  by  insisting  that  our  choice  of  5  not  exceed  1.  This  guarantees 
that  all  values  of  x  under  consideration  will  necessarily  fall  in  the  interval 
(c  —  1,  c  +  1).  It  follows  that 


x  +  c\  <  \x\  +  \c\  <  (|c|  +  1)  +  \c\  =  2|c|  +  1 


Now,  let  e  >  0.  If  we  choose  S  =  min{l,  e/(2|c|  +  1)},  then 
implies 


x  —  c 


<  S 


I  f(x)  -  /(c) 


X 


c 


X  +  c  < 


2c  -hi 


(2\c\  +  1)  =  e. 


Now,  there  is  nothing  deficient  about  this  argument,  but  it  is  important 
to  notice  that,  in  the  second  proof,  the  algorithm  for  choosing  the  response  S 
depends  on  the  value  of  c.  The  statement 


2c  +1 


means  that  larger  values  of  c  are  going  to  require  smaller  values  of  S,  a  fact 
that  should  be  evident  from  a  consideration  of  the  graph  of  g(x)  =  x2  (Fig.  4.7). 
Given,  say,  e  =  1,  a  response  of  S  =  1/3  is  sufficient  for  c  =  1  because  2/3  < 
x  <  4/3  certainly  implies  0  <  x2  <2.  However,  if  c  =  10,  then  the  steepness 
of  the  graph  of  g(x)  means  that  a  much  smaller  S  is  required — S  =  1/21  by  our 
rule — to  force  99  <  x2  <  101. 

The  next  definition  is  meant  to  distinguish  between  these  two  examples. 


a  smaller  5. 
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Definition  4.4.4  (Uniform  Continuity).  A  function  f  :  A  R  is  uniformly 
continuous  on  A  if  for  every  e  >  0  there  exists  a  S  >  0  such  that  for  all  x,y  E  A, 


x 


y\  <  S  implies  \f(x)  —  f(y)\  <  e. 


Recall  that  to  say  that  “/  is  continuous  on  A”  means  that  /  is  continuous  at 
each  individual  point  c  £  A.  In  other  words,  given  e  >  0  and  c  £  A,  we  can  find 
a  S  >  0  perhaps  depending  on  c  such  that  if  \x  —  c\  <  5,  then  |  f(x)  -  /(c)  |  <  e. 
Uniform  continuity  is  a  strictly  stronger  property.  The  key  distinction  between 
asserting  that  /  is  “uniformly  continuous  on  A”  versus  simply  “continuous  on  A” 
is  that,  given  an  e  >  o,  a  single  S  >  0  can  be  chosen  that  works  simultaneously 
for  all  points  c  in  A.  To  say  that  a  function  is  not  uniformly  continuous  on  a  set 
A,  then,  does  not  necessarily  mean  it  is  not  continuous  at  some  point.  Rather,  it 
means  that  there  is  some  eo  >  0  for  which  no  single  (5  >  0  is  a  suitable  response 
for  all  c  £  A. 


Theorem  4.4.5  (Sequential  Criterion  for  Absence  of  Uniform  Conti¬ 
nuity).  A  function  f  :  A  R  fails  to  be  uniformly  continuous  on  A  if  and 
only  if  there  exists  a  particular  eo  >  0  and  two  sequences  (xn)  and  (yn)  in  A 
satisfying 

Xn-yn mo  but  \f(xn)  —  f(yn)\  >  e0. 

Proof.  The  negation  of  Definition  4.4.4  states  that  /  is  not  uniformly  continuous 
on  A  if  and  only  if  there  exists  eo  >  0  such  that  for  all  S  >  0  we  can  find  two 
points  x  and  y  satisfying  \x  —  y\  <  S  but  with  \f(x)  —  f(y)\  >  eo-  Thus,  if 
we  set  =  1,  then  there  exist  two  points  x\  and  y\  where  \x\  —  yi\  <  1  but 

I  f(xi)  -  f(yi)\  >  £0- 

In  a  similar  way,  if  we  set  5n  =  1/n  where  n  £  N,  it  follows  that  there 
exist  points  xn  and  yn  with  \xn  —  yn\  <  1/n  but  where  | f(xn)  —  f(yn) |  A  eo- 
The  resulting  sequences  (xn)  and  (yn)  satisfy  the  requirements  described  in  the 
theorem. 

Conversely,  if  eo,  (xn)  and  (yn)  exist  as  described,  it  is  straightforward  to 
see  that  no  S  >  0  is  a  suitable  response  for  eo-  □ 

Example  4.4.6.  The  function  h(x)  =  sin(l/x)  (Fig.  4.5)  is  continuous  at  every 
point  in  the  open  interval  (0, 1)  but  is  not  uniformly  continuous  on  this  interval. 
The  problem  arises  near  zero,  where  the  increasingly  rapid  oscillations  take 
domain  values  that  are  quite  close  together  to  range  values  a  distance  2  apart. 
To  illustrate  Theorem  4.4.5,  take  eo  =  2  and  set 


1 


xn  — 


7t/2  +  2utt 


and  yn  = 


1 


37r/2  +  2n7r 


Because  each  of  these  sequences  tends  to  zero,  we  have  \x 
short  calculation  reveals  | h(xn)  —  h(yn) \  =  2  for  all  n  £  N. 


Un 


0,  and  a 


Whereas  continuity  is  defined  at  a  single  point,  uniform  continuity  is  always 
discussed  in  reference  to  a  particular  domain.  In  Example  4.4.3,  we  were  not 
able  to  prove  that  g(x)  =  x 2  is  uniformly  continuous  on  R  because  larger 
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values  of  x  require  smaller  and  smaller  values  of  8.  (As  another  illustration 
of  Theorem  4.4.5,  take  xn  =  n  and  yn  =  n  +  1/n.)  It  is  true,  however,  that 
g(x)  is  uniformly  continuous  on  the  bounded  set  [—10,10].  Returning  to  the 
argument  set  forth  in  Example  4.4.3  (ii),  notice  that  if  we  restrict  our  attention 
to  the  domain  [—10, 10],  then  \x  +  y\  <  20  for  all  x  and  y.  Given  e  >  0,  we  can 
now  choose  8  =  c/20,  and  verify  that  if  x,  y  E  [—10, 10]  satisfy  \x  —  y\  <  8,  then 


/  0)  -  f(y) 


X 


v  x  +  y  < 


20  =  e. 


In  fact,  it  is  not  difficult  to  see  how  to  modify  this  argument  to  show  that  g{x) 
is  uniformly  continuous  on  any  bounded  set  A  in  R. 

Now,  Example  4.4.6  is  included  to  keep  us  from  jumping  to  the  erroneous 
conclusion  that  functions  that  are  continuous  on  bounded  domains  are  neces¬ 
sarily  uniformly  continuous.  A  general  result  does  follow,  however,  if  we  assume 
that  the  domain  is  compact. 


Theorem  4.4.7  (Uniform  Continuity  on  Compact  Sets).  A  function  that 
is  continuous  on  a  compact  set  K  is  uniformly  continuous  on  K . 


Proof.  Assume  /  :  K  R  is  continuous  at  every  point  of  a  compact  set  K  C  R. 
To  prove  that  /  is  uniformly  continuous  on  K  we  argue  by  contradiction. 

By  the  criterion  in  Theorem  4.4.5,  if  /  is  not  uniformly  continuous  on  K, 
then  there  exist  two  sequences  (xn)  and  (yn)  in  K  such  that 


lim 


x 


n 


while 


f(xn )  -  f{yn)  |  >  £0 


for  some  particular  eo  >  0.  Because  K  is  compact,  the  sequence  (xn)  has  a 
convergent  subsequence  (xnk)  with  x  =  limxnfc  also  in  K. 

We  could  use  the  compactness  of  K  again  to  produce  a  convergent  subse¬ 
quence  of  (2/n),  but  notice  what  happens  when  we  consider  the  particular  sub¬ 
sequence  (ynk)  consisting  of  those  terms  in  (yn)  that  correspond  to  the  terms 
in  the  convergent  subsequence  (xnk).  By  the  Algebraic  Limit  Theorem, 

limOnfc)  =  lim ((ynk  -  Xnk)  +  xnk)  =  0  +  X. 

The  conclusion  is  that  both  (xnk)  and  (ynk)  converge  to  x  G  K.  Because  /  is 
assumed  to  be  continuous  at  x,  we  have  lim  f{xnk)  =  f{x)  and  lim  f(ynk)  = 
f(x),  which  implies 

lim (f(x„k)  -  f(y„k ))  =  0. 

A  contradiction  arises  when  we  recall  that  (xn)  and  (yn)  were  chosen  to  satisfy 


I  f(x„)  -  f(yn) I  >  eo 


for  all  n  G  N.  We  conclude,  then,  that  /  is  indeed  uniformly  continuous  on  K. 

□ 


134 


Chapter  4.  Functional  Limits  and  Continuity 


Exercises 


Exercise  4.4.1.  (a)  Show  that  f(x)  =  x 3  is  continuous  on  all  of  R. 

(b)  Argue,  using  Theorem  4.4.5,  that  /  is  not  uniformly  continuous  on  R. 

(c)  Show  that  /  is  uniformly  continuous  on  any  bounded  subset  of  R. 

Exercise  4.4.2.  (a)  Is  f(x)  =  1/x  uniformly  continuous  on  (0, 1)? 

(b)  Is  g(x)  =  V  x2  +  1  uniformly  continuous  on  (0, 1)? 

(c)  Is  h{x)  =  xsin(l/x)  uniformly  continuous  on  (0, 1)? 

Exercise  4.4.3.  Show  that  f(x)  =  1/x 2  is  uniformly  continuous  on  the  set 
[1,  oo )  but  not  on  the  set  (0,1]. 

Exercise  4.4.4.  Decide  whether  each  of  the  following  statements  is  true  or 
false,  justifying  each  conclusion. 

(a)  If  /  is  continuous  on  [a,  b]  with  f(x)  >  0  for  all  a  <  x  <  6,  then  1//  is 
bounded  on  [a,  b]  (meaning  1/f  has  bounded  range). 

(b)  If  /  is  uniformly  continuous  on  a  bounded  set  A,  then  f(A)  is  bounded. 

(c)  If  /  is  defined  on  R  and  f(K )  is  compact  whenever  K  is  compact,  then  / 
is  continuous  on  R. 


Exercise  4.4.5.  Assume  that  g  is  defined  on  an  open  interval  (a,  c)  and  it  is 
known  to  be  uniformly  continuous  on  (a,  b\  and  [6,  c),  where  a  <  b  <  c.  Prove 
that  g  is  uniformly  continuous  on  (a,  c). 

Exercise  4.4.6.  Give  an  example  of  each  of  the  following,  or  state  that  such  a 
request  is  impossible.  For  any  that  are  impossible,  supply  a  short  explanation 
for  why  this  is  the  case. 

(a)  A  continuous  function  /  :  (0, 1)  —>  R  and  a  Cauchy  sequence  (xn)  such 
that  f(xn)  is  not  a  Cauchy  sequence; 

(b)  A  uniformly  continuous  function  /  :  (0, 1)  R  and  a  Cauchy  sequence 
(xn)  such  that  f(xn)  is  not  a  Cauchy  sequence; 

(c)  A  continuous  function  /  :  [0,  oo)  R  and  a  Cauchy  sequence  (xn)  such 
that  f(xn)  is  not  a  Cauchy  sequence; 

Exercise  4.4.7.  Prove  that  f(x)  =  yCc  is  uniformly  continuous  on  [0,  oo). 

Exercise  4.4.8.  Give  an  example  of  each  of  the  following,  or  provide  a  short 
argument  for  why  the  request  is  impossible. 


(a)  A  continuous  function  defined  on  [0, 1]  with  range  (0, 1). 

(b)  A  continuous  function  defined  on  (0, 1)  with  range  [0, 1]. 
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(c)  A  continuous  function  defined  on  (0, 1]  with  range  (0, 1). 

Exercise  4.4.9  (Lipschitz  Functions).  A  function  f  :  A  R  is  called 
Lipschitz  if  there  exists  a  bound  M  >  0  such  that 


f(x)  -  f(y ) 
x-y 


<  M 


for  all  x  7^  y  E  A.  Geometrically  speaking,  a  function  /  is  Lipschitz  if  there  is  a 
uniform  bound  on  the  magnitude  of  the  slopes  of  lines  drawn  through  any  two 
points  on  the  graph  of  /. 

(a)  Show  that  if  /  :  A  R  is  Lipschitz,  then  it  is  uniformly  continuous  on  A. 

(b)  Is  the  converse  statement  true?  Are  all  uniformly  continuous  functions 
necessarily  Lipschitz? 

Exercise  4.4.10.  Assume  that  /  and  g  are  uniformly  continuous  functions 
defined  on  a  common  domain  A.  Which  of  the  following  combinations  are 
necessarily  uniformly  continuous  on  A: 

f(x)+g(x),  f(x)g(x),  f(g(x))l 

g{x) 

(Assume  that  the  quotient  and  the  composition  are  properly  defined  and  thus 
at  least  continuous.) 

Exercise  4.4.11  (Topological  Characterization  of  Continuity).  Let  g  be 

defined  on  all  of  R.  If  B  is  a  subset  of  R,  define  the  set  g~1(B)  by 

g~1{B)  =  {x  G  R  :  g(x)  E  B}. 


Show  that  g  is  continuous  if  and  only  if  g  1(0)  is  open  whenever  O  C  R  is  an 
open  set. 


Exercise  4.4.12.  Review  Exercise  4.4.11,  and  then  determine  which  of  the 
following  statements  is  true  about  a  continuous  function  defined  on  R: 

(a)  f~l{B )  is  finite  whenever  B  is  finite. 

(b)  /_1(iL)  is  compact  whenever  K  is  compact. 

(c)  f~l{A)  is  bounded  whenever  A  is  bounded. 

(d)  /-1(F)  is  closed  whenever  F  is  closed. 

Exercise  4.4.13  (Continuous  Extension  Theorem).  (a)  Show  that  a 
uniformly  continuous  function  preserves  Cauchy  sequences;  that  is,  if 
/  :  A  R  is  uniformly  continuous  and  (xn)  C  A  is  a  Cauchy  sequence, 
then  show  f(xn)  is  a  Cauchy  sequence. 
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(b)  Let  g  be  a  continuous  function  on  the  open  interval  (a,  b).  Prove  that 
g  is  uniformly  continuous  on  (a,  b )  if  and  only  if  it  is  possible  to  define 
values  g(a)  and  g(b)  at  the  endpoints  so  that  the  extended  function  g  is 
continuous  on  [a,  b\.  (In  the  forward  direction,  first  produce  candidates 
for  g(a)  and  g(b),  and  then  show  the  extended  g  is  continuous.) 

Exercise  4.4.14.  Construct  an  alternate  proof  of  Theorem  4.4.7  using  the 
open  cover  characterization  of  compactness  from  the  Heine-Borel  Theorem 
(Theorem  3.3.8  (hi)). 


4.5  The  Intermediate  Value  Theorem 


The  Intermediate  Value  Theorem  (IVT)  is  the  name  given  to  the  very  intuitive 
observation  that  a  continuous  function  /  on  a  closed  interval  [a,  b }  attains  every 
value  that  falls  between  the  range  values  /(a)  and  f(b)  (Fig.  4.8). 

Here  is  this  observation  in  the  language  of  analysis. 


Theorem  4.5.1  (Intermediate  Value  Theorem).  Let  f  :  [a,  b]  — )►  R  be 

continuous.  If  L  is  a  real  number  satisfying  f{a)  <  L  <  f(b)  or  f  (a)  >  L  > 
f(b),  then  there  exists  a  point  c  E  (a,  b)  where  /(c)  =  L. 


This  theorem  was  freely  used  by  mathematicians  of  the  18th  century  (includ¬ 
ing  Euler  and  Gauss)  without  any  consideration  of  its  validity.  In  fact,  the  first 
analytical  proof  was  not  offered  until  1817  by  Bolzano  in  a  paper  that  also  con¬ 
tains  the  first  appearance  of  a  somewhat  modern  definition  of  continuity.  This 
emphasizes  the  significance  of  this  result.  As  discussed  in  Section  4.1,  Bolzano 
and  his  contemporaries  had  arrived  at  a  point  in  the  evolution  of  mathematics 
where  it  was  becoming  increasingly  important  to  firm  up  the  foundations  of  the 
subject.  Doing  so,  however,  was  not  simply  a  matter  of  going  back  and  sup¬ 
plying  the  missing  proofs.  The  real  battle  lay  in  first  obtaining  a  thorough  and 
mutually  agreed-upon  understanding  of  the  relevant  concepts.  The  importance 
of  the  Intermediate  Value  Theorem  for  us  is  similar  in  that  our  understanding 
of  continuity  and  the  nature  of  the  real  line  is  now  mature  enough  for  a  proof  to 
be  possible.  Indeed,  there  are  several  satisfying  arguments  for  this  simple  result, 
each  one  isolating,  in  a  slightly  different  way,  the  interplay  between  continuity 
and  completeness. 


Preservation  of  Connected  Sets 

The  most  potentially  useful  way  to  understand  the  Intermediate  Value  Theorem 
(IVT)  is  as  a  special  case  of  the  fact  that  continuous  functions  map  connected 
sets  to  connected  sets.  In  Theorem  4.4.1,  we  saw  that  if  /  is  a  continuous 
function  on  a  compact  set  iL,  then  the  range  set  f(K)  is  also  compact.  The 
analogous  observation  holds  for  connected  sets. 

Theorem  4.5.2  (Preservation  of  Connected  Sets).  Let  f  :  G  R  be 

continuous.  IfE  C  G  is  connected ,  then  f[E )  is  connected  as  well. 
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Proof.  Intending  to  use  the  characterization  of  connected  sets  in  Theorem  3.4.6, 
let  f(E)  =  AU  B  where  A  and  B  are  disjoint  and  nonempty.  Our  goal  is  to 
produce  a  sequence  contained  in  one  of  these  sets  that  converges  to  a  limit  in 
the  other. 

Let 


C  =  {x  G  E  :  f(x)  G  A}  and  D  =  {x  G  E  :  f(x)  G  B}. 

The  sets  C  and  D  are  called  the  preimages  of  A  and  R,  respectively.  Using  the 
properties  of  A  and  5,  it  is  straightforward  to  check  that  C  and  D  are  nonempty 
and  disjoint  and  satisfy  E  =  C  U  D.  Now,  we  are  assuming  E  is  a  connected 
set,  so  by  Theorem  3.4.6,  there  exists  a  sequence  (xn)  contained  in  one  of  C  or 
D  with  x  =  limxn  contained  in  the  other.  Finally,  because  /  is  continuous  at  x, 
we  get  f(x)  =  lim  f(xn).  Thus,  it  follows  that  f(xn)  is  a  convergent  sequence 
contained  in  either  A  or  B  while  the  limit  f(x)  is  an  element  of  the  other.  With 
another  nod  to  Theorem  3.4.6,  the  proof  is  complete.  □ 

In  R,  a  set  is  connected  if  and  only  if  it  is  a  (possibly  unbounded)  interval. 
This  fact,  together  with  Theorem  4.5.2,  leads  to  a  short  proof  of  the  Interme¬ 
diate  Value  Theorem  (Exercise  4.5.1).  We  should  point  out  that  the  proof  of 
Theorem  4.5.2  does  not  make  use  of  the  equivalence  between  connected  sets  and 
intervals  in  R  but  relies  only  on  the  general  definitions.  The  previous  comment 
that  this  is  the  most  useful  way  to  approach  IVT  stems  from  the  fact  that, 
although  it  is  not  discussed  here,  the  definitions  of  continuity  and  connected¬ 
ness  can  be  easily  adapted  to  higher-dimensional  settings.  Theorem  4.5.2,  then, 
remains  a  valid  conclusion  in  higher  dimensions,  whereas  the  Intermediate  Value 
Theorem  is  essentially  a  one-dimensional  result. 
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Completeness 

A  typical  way  the  Intermediate  Value  Theorem  is  applied  is  to  prove  the  exis¬ 
tence  of  roots.  Given  f(x)  =  x 2  —  2,  for  instance,  we  see  that  /( 1)  =  —1  and 
/( 2)  =  2.  Therefore,  there  exists  a  point  c  E  (1,  2)  where  /(c)  =  0. 

In  this  case,  we  can  easily  compute  c  =  \/2,  meaning  that  we  really  did  not 
need  IVT  to  show  that  /  has  a  root.  We  spent  a  good  deal  of  time  in  Chapter  1 
proving  that  y/2  exists,  which  was  only  possible  once  we  insisted  on  the  Axiom  of 
Completeness  as  part  of  our  assumptions  about  the  real  numbers.  The  fact  that 
the  Intermediate  Value  Theorem  has  just  asserted  that  y/2  exists  suggests  that 
another  way  to  understand  this  result  is  in  terms  of  the  relationship  between 
the  continuity  of  /  and  the  completeness  of  R. 

The  Axiom  of  Completeness  (AoC)  from  the  first  chapter  states  that 
“Nonempty  sets  that  are  bounded  above  have  least  upper  bounds.”  Later,  we 
saw  that  the  Nested  Interval  Property  (NIP)  is  an  equivalent  way  to  assert  that 
the  real  numbers  have  no  “gaps.”  Either  of  these  characterizations  of  complete¬ 
ness  can  be  used  as  the  cornerstone  for  an  alternate  proof  of  Theorem  4.5.1. 


Proof.  I.  ( First  approach  using  AoC.)  To  simplify  matters  a  bit,  let’s  consider 
the  special  case  where  /  is  a  continuous  function  satisfying  /(a)  <  0  <  f{b)  and 
show  that  /(c)  =  0  for  some  c  E  (a,  b).  First  let 


K  =  {x  E  [a,  b }  :  f(x)  <  0}. 


Notice  that  K  is  bounded  above  by  6,  and  a  E  K  so  K  is  not  empty.  Thus  we 
may  appeal  to  the  Axiom  of  Completeness  to  assert  that  c  =  sup  K  exists. 
There  are  three  cases  to  consider: 

/(c)  >  0,  /(c)  <  0,  and  /(c)  =  0. 

The  fact  that  c  is  the  least  upper  bound  of  K  can  be  used  to  rule  out  the  first 
two  cases,  resulting  in  the  desired  conclusion  that  /(c)  =  0.  The  details  are 
requested  in  Exercise  4.5.5(a). 

II.  ( Second  approach  using  NIP.)  Again,  consider  the  special  case  where 
L  =  0  and  /(a)  <  0  <  f(b).  Let  Iq  =  [a,  6],  and  consider  the  midpoint 


z  =  (a  +  b)  /  2. 
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If  f(z)  >  0,  then  set  cl\  =  a  and  b\  =  z.  If  f(z )  <  0,  then  set  a\  =  z  and  b\  =  b. 
In  either  case,  the  interval  I\  =  [a\,b\\  has  the  property  that  /  is  negative  at 
the  left  endpoint  and  nonnegative  at  the  right. 


I - 1  1 2 

This  procedure  can  be  inductively  repeated,  setting  the  stage  for  an  applica¬ 
tion  of  the  Nested  Interval  Property.  The  remainder  of  the  argument  is  left  as 
Exercise  4.5.5(b).  □ 

The  Intermediate  Value  Property 

Does  the  Intermediate  Value  Theorem  have  a  converse? 

Definition  4.5.3.  A  function  /  has  the  intermediate  value  property  on  an 
interval  [a,  b]  if  for  all  x  <  y  in  [a,  b]  and  all  L  between  f{pc)  and  f(y),  it  is 
always  possible  to  find  a  point  c  e  (x,y)  where  /(c)  =  L. 

Another  way  to  summarize  the  Intermediate  Value  Theorem  is  to  say  that 
every  continuous  function  on  [a,  b]  has  the  intermediate  value  property.  There 
is  an  understandable  temptation  to  suspect  that  any  function  that  has  the  in¬ 
termediate  value  property  must  necessarily  be  continuous,  but  that  is  not  the 
case.  We  have  seen  that 


nM  =  I  sin(l/x)  ifx^O 
9[  J  \  0  if  x  =  0 

is  not  continuous  at  zero  (Example  4.2.6),  but  it  does  have  the  intermediate 
value  property  on  [0, 1]. 

The  intermediate  value  property  does  imply  continuity  if  we  insist  that  our 
function  is  monotone  (Exercise  4.5.3). 

Exercises 

Exercise  4.5.1.  Show  how  the  Intermediate  Value  Theorem  follows  as  a  corol¬ 
lary  to  Theorem  4.5.2. 

Exercise  4.5.2.  Provide  an  example  of  each  of  the  following,  or  explain  why 
the  request  is  impossible 
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(a)  A  continuous  function  defined  on  an  open  interval  with  range  equal  to  a 
closed  interval. 

(b)  A  continuous  function  defined  on  a  closed  interval  with  range  equal  to  an 
open  interval. 

(c)  A  continuous  function  defined  on  an  open  interval  with  range  equal  to  an 
unbounded  closed  set  different  from  R. 

(d)  A  continuous  function  defined  on  all  of  R  with  range  equal  to  Q. 


Exercise  4.5.3.  A  function  /  is  increasing  on  A  if  f(x)  <  f(y )  for  all  x  <  y 
in  A.  Show  that  if  /  is  increasing  on  [a,  b]  and  satisfies  the  intermediate  value 
property  (Definition  4.5.3),  then  /  is  continuous  on  [a,  b\. 


Exercise  4.5.4.  Let  g  be  continuous  on  an  interval  A  and  let  F  be  the  set  of 
points  where  g  fails  to  be  one-to-one;  that  is, 


F  =  {x  E  A  :  f(x)  =  f(y )  for  some  y  ^  x  and  y  E  A}. 


Show  F  is  either  empty  or  uncountable. 

Exercise  4.5.5.  (a)  Finish  the  proof  of  the  Intermediate  Value  Theorem 

using  the  Axiom  of  Completeness  started  previously. 


(b)  Finish  the  proof  of  the  Intermediate  Value  Theorem  using  the  Nested 
Interval  Property  started  previously. 


Exercise  4.5.6.  Let  /  :  [0, 1]  R  be  continuous  with  /( 0)  =  /( 1). 


Show  that  there  must  exist  x,  y  E  [0,1]  satisfying  \x 
f{x)  =  f(y). 


y 


1/2  and 


Show  that  for  each  n  E  N  there  exist  xn,yn  E  [0, 1]  with 
and  f(xn)  =  f{yn ). 


%n  Vn 


(c)  If  h  E  (0,1/2)  is  not  of  the  form  1/n,  there  does  not  necessarily  exist 
x  —  y  |  =  h  satisfying  f(x)  =  f(y).  Provide  an  example  that  illustrates 
this  using  h  =  2/5. 


Exercise  4.5.7.  Let  /  be  a  continuous  function  on  the  closed  interval  [0, 1] 
with  range  also  contained  in  [0, 1].  Prove  that  /  must  have  a  fixed  point;  that 
is,  show  f(x)  =  x  for  at  least  one  value  of  x  E  [0, 1]. 


Exercise  4.5.8  (Inverse  functions).  If  a  function  f  :  A  R  is  one-to-one, 
then  we  can  define  the  inverse  function  /-1  on  the  range  of  /  in  the  natural 
way:  /_1(?/)  =  x  where  y  =  f(x). 

Show  that  if  /  is  continuous  on  an  interval  [a,  b }  and  one-to-one,  then  /-1  is 
also  continuous. 


4.6.  Sets  of  Discontinuity 
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Given  a  function  /  :  R  R,  define  Df  C  R  to  be  the  set  of  points  where 
the  function  /  fails  to  be  continuous.  In  Section  4.1,  we  saw  that  Dirichlet’s 
function  g{pc)  had  Dg  =  R.  The  modification  h(x)  of  Dirichlet’s  function  had 
Dh  =  R\{0},  zero  being  the  only  point  of  continuity.  Finally,  for  Thomae’s 
function  t(pc),  we  saw  that  Dt  =  Q. 

Exercise  4.6.1.  Using  modifications  of  these  functions,  construct  a  function 
/  :  R  — ^  R  so  that 

(a)  Df  =  Zc. 

(b)  Df  =  {x  :  0  <  x  <  1}. 

Exercise  4.6.2.  Given  a  countable  set  A  =  {ai,  <22,  <23, . . .},  define  f(an )  =  1/n 
and  f(x)  =  0  for  all  x  ^  A.  Find  Df. 

We  concluded  the  introduction  with  a  question  about  whether  Df  could  take 
the  form  of  any  arbitrary  subset  of  the  real  line.  As  it  turns  out,  this  is  not 
the  case.  The  set  of  discontinuities  of  a  real- valued  function  on  R  has  a  specific 
topological  structure  that  is  not  possessed  by  every  subset  of  R.  Specifically, 
Df ,  no  matter  how  /  is  chosen,  can  always  be  written  as  the  countable  union 
of  closed  sets.  In  the  case  where  /  is  monotone ,  these  closed  sets  can  be  taken 
to  be  single  points. 

Monotone  Functions 

Classifying  Df  for  an  arbitrary  /  is  somewhat  involved,  so  it  is  interesting  that 
describing  Df  is  fairly  straightforward  for  the  class  of  monotone  functions. 

Definition  4.6.1.  A  function  f  :  A  R  is  increasing  on  A  if  f(x)  <  f(y) 
whenever  x  <  y  and  decreasing  if  f(x)  >  f(y )  whenever  x  <  y  in  A.  A 
monotone  function  is  one  that  is  either  increasing  or  decreasing. 

Continuity  of  /  at  a  point  c  means  that  lim X^cf(x)  =  f(c).  One  particular 
way  for  a  discontinuity  to  occur  is  if  the  limit  from  the  right  at  c  is  different 
from  the  limit  from  the  left  at  c.  As  always  with  new  terminology,  we  need  to 
be  precise  about  what  we  mean  by  “from  the  left”  and  “from  the  right.” 

Definition  4.6.2.  Given  a  limit  point  c  of  a  set  A  and  a  function  f  :  A  R, 
we  write 

lim  f(x)  =  L 

if  for  all  e  >  0  there  exists  a  S  >  0  such  that  \f(x)  —  L\  <  e  whenever  0  <  x—c  <  (5. 

Equivalently,  in  terms  of  sequences,  lima._)>c+  f(x)  =  L  if  lim  f(xn)  =  L  for 
all  sequences  (xn)  satisfying  xn  >  c  and  lim(xn)  =  c. 
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Exercise  4.6.3.  State  a  similar  definition  for  the  left-hand  limit 

lim  f(x)  =  L. 

x^-c~ 

Theorem  4.6.3.  Given  f  :  A  R  and  a  limit  point  c  of  A,  lim x->cf(x)  =  L 
if  and  only  if 

lim  f{pc)  =  L  and  lim  f(x)  =  L. 

x^-c~  x^-c+ 

Exercise  4.6.4.  Supply  a  proof  for  this  proposition. 

Generally  speaking,  discontinuities  can  be  divided  into  three  categories: 

(i)  If  lim x->cf(x)  exists  but  has  a  value  different  from  /(c),  the  discontinuity 
at  c  is  called  removable. 

(ii)  If  limx^c+  f(x)  7^  lim^^-  /(#),  then  /  has  a  jump  discontinuity  at  c. 

(iii)  If  liny^c  f{x)  does  not  exist  for  some  other  reason,  then  the  discontinuity 
at  c  is  called  an  essential  discontinuity. 

We  are  now  equipped  to  characterize  the  set  Df  for  an  arbitrary  monotone 
function  /. 

Exercise  4.6.5.  Prove  that  the  only  type  of  discontinuity  a  monotone  function 
can  have  is  a  jump  discontinuity. 

Exercise  4.6.6.  Construct  a  bijection  between  the  set  of  jump  discontinuities 
of  a  monotone  function  /  and  a  subset  of  Q.  Conclude  that  Df  for  a  monotone 
function  /  must  either  be  finite  or  countable,  but  not  uncountable. 

Df  for  an  Arbitrary  Function 

Recall  that  the  intersection  of  an  infinite  collection  of  closed  sets  is  closed,  but 
for  unions  we  must  restrict  ourselves  to  finite  collections  of  closed  sets  in  order 
to  ensure  the  union  is  closed.  For  open  sets  the  situation  is  reversed.  The 
arbitrary  union  of  open  sets  is  open,  but  only  finite  intersections  of  open  sets 
are  necessarily  open. 

Definition  4.6.4.  A  set  that  can  be  written  as  the  countable  union  of  closed 
sets  is  in  the  class  Fa.  (This  definition  also  appeared  in  Section  3.5.) 

In  Section  4.1  we  constructed  functions  where  the  set  of  discontinuity  was  R 
(Dirichlet’s  function),  R\{0}  (modified  Dirichlet  function),  and  Q  (Thomae’s 
function) . 

Exercise  4.6.7.  (a)  Show  that  in  each  of  the  above  cases  we  get  an  Fa  set 

as  the  set  where  the  function  is  discontinuous. 

(b)  Show  that  the  two  sets  of  discontinuity  in  Exercise  4.6.1  are  Fa  sets. 
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The  upcoming  argument  depends  on  a  concept  called  ^-continuity. 

Definition  4.6.5.  Let  /  be  defined  on  R,  and  let  a  >  0.  The  function  /  is 
a-continuous  at  x  E  R  if  there  exists  a  S  >  0  such  that  for  all  y,z  G  (x  —  5,x  +  5) 
it  follows  that  \f(y)  —  f(z)  \  <  a. 

The  most  important  thing  to  note  about  this  definition  is  that  there  is  no 
“for  all”  in  front  of  the  a  >  0.  As  we  will  investigate,  adding  this  quantifier 
would  make  this  definition  equivalent  to  our  definition  of  continuity.  In  a  sense, 
(^-continuity  is  a  measure  of  the  variation  of  the  function  in  the  neighborhood 
of  a  particular  point.  A  function  is  a-continuous  at  a  point  c  if  there  is  some 
interval  centered  at  c  in  which  the  variation  of  the  function  never  exceeds  the 
value  a  >  0. 

Given  a  function  /  on  R,  define  D°f  to  be  the  set  of  points  where  the  function 
/  fails  to  be  a-continuous.  In  other  words, 

DJ  =  {x  G  R  :  /  is  not  a-continuous  at  x}. 

Exercise  4.6.8.  Prove  that,  for  a  fixed  a  >  0,  the  set  Df  is  closed. 

The  stage  is  set.  It  is  time  to  characterize  the  set  of  discontinuity  for  an 
arbitrary  function  /  on  R. 

Theorem  4.6.6.  Let  f  :  R  — >  R  be  an  arbitrary  function.  Then ,  Df  is  an  Fa 
set. 

Proof.  Recall  that 

Df  =  {x  £  R  :/  is  not  continuous  at  x}. 

Exercise  4.6.9.  If  a  <  a7,  show  that  DJ  C  D°f . 

Exercise  4.6.10.  Let  a  >  0  be  given.  Show  that  if  /  is  continuous  at  x,  then 
it  is  a-continuous  at  x  as  well.  Explain  how  it  follows  that  D°f  CD/. 

Exercise  4.6.11.  Show  that  if  /  is  not  continuous  at  x,  then  /  is  not 
a-continuous  for  some  a  >  0.  Now  explain  why  this  guarantees  that 

oo 

Df=  U  DT ’ 

71  =  1 

where  an  =  1/n. 

Because  each  D^n  is  closed,  the  proof  is  complete.  □ 


144 


Chapter  4.  Functional  Limits  and  Continuity 


4.7  Epilogue 

Theorem  4.6.6  is  only  interesting  if  we  can  demonstrate  that  not  every  subset 
of  R  is  in  an  Fa  set.  This  takes  some  effort  and  was  included  as  an  exercise  in 
Section  3.5  on  the  Baire  Category  Theorem.  Baire’s  Theorem  states  that  if  R  is 
written  as  the  countable  union  of  closed  sets,  then  at  least  one  of  these  sets  must 
contain  a  nonempty  open  interval.  Now  Q  is  the  countable  union  of  singleton 
points,  and  we  can  view  each  point  as  a  closed  set  that  obviously  contains  no 
intervals.  If  the  set  of  irrationals  I  were  a  countable  union  of  closed  sets,  it  would 
have  to  be  that  none  of  these  closed  sets  contained  any  open  intervals  or  else  they 
would  then  contain  some  rational  numbers.  But  this  leads  to  a  contradiction 
to  Baire’s  Theorem.  Thus,  I  is  not  the  countable  union  of  closed  sets,  and 
consequently  it  is  not  an  Fa  set.  We  may  therefore  conclude  that  there  is  no 
function  /  that  is  continuous  at  every  rational  point  and  discontinuous  at  every 
irrational  point.  This  should  be  compared  with  Thomae’s  function  discussed 
earlier. 

The  converse  question  is  interesting  as  well.  Given  an  arbitrary  Fa  set,  W.H. 
Young  showed  in  1903  that  it  is  always  possible  to  construct  a  function  that  has 
discontinuities  precisely  on  this  set.  Exercise  4.3.14  gives  some  clues  for  how 
to  do  this  in  the  simpler  case  of  an  arbitrary  closed  set,  and  Exercise  4.6.2 
handles  the  case  of  an  arbitrary  countable  set.  Combining  the  techniques  in 
these  two  exercises  with  the  Dirichlet-type  definitions  we  have  seen  leads  to  a 
proof  of  Young’s  result.  (Try  it!)  A  function  demonstrating  the  converse  for  the 
monotone  case  described  in  Exercise  4.6.6  is  also  not  too  difficult  to  describe. 
Let 

D  =  {xi,x2,x3,X4, . . .} 

be  an  arbitrary  countable  set  of  real  numbers.  In  order  to  construct  a  monotone 
function  that  has  discontinuities  precisely  on  D ,  we  first  consider  a  particular 
xn  G  D  and  define  the  step  function 

(  ,  f  l/2n  for  x  >  xn 

K  |  0  for  x  <  xn. 

Observing  that  each  un(x)  is  monotone  and  everywhere  continuous  except  for 
a  single  discontinuity  at  xn,  we  now  set 

oo 

f(x)  =  y ^un{x). 

n— 1 

The  convergence  of  the  series  ^  l/2n  guarantees  that  our  function  /  is  defined 
on  all  of  R,  and  intuition  certainly  suggests  that  /  is  monotone  with  jump 
discontinuities  precisely  on  D.  Providing  a  rigorous  proof  for  this  conclusion  is 
one  of  the  many  pleasures  that  awaits  in  Chapter  6,  where  we  take  up  the  study 
of  infinite  series  of  functions. 


Chapter  5 

The  Derivative 


5.1  Discussion:  Are  Derivatives  Continuous? 


The  geometric  motivation  for  the  derivative  is  most  likely  familiar  territory. 
Given  a  function  g(x),  the  derivative  g'(x)  is  understood  to  be  the  slope  of  the 
graph  of  g  at  each  point  x  in  the  domain.  A  graphical  picture  (Fig.  5.1)  reveals 
the  impetus  behind  the  mathematical  definition 


lim 


g(x)  —  g{c) 


X^rC  X  —  C 


The  difference  quotient  (g(x)  —  g(c))/(x  —  c )  represents  the  slope  of  the  line 
through  the  two  points  (x,  g(x))  and  (c,  g(c)).  By  taking  the  limit  as  x  approaches 
c,  we  arrive  at  a  well-defined  mathematical  meaning  for  the  slope  of  the  tangent 
line  at  x  =  c. 

The  myriad  applications  of  the  derivative  function  are  the  topic  of  much 
of  the  calculus  sequence,  as  well  as  several  other  upper-level  courses  in  mathe¬ 
matics.  None  of  these  applied  questions  are  pursued  here  in  any  length,  but  it 
should  be  pointed  out  that  the  rigorous  underpinnings  for  differentiation  worked 
out  in  this  chapter  are  an  essential  foundation  for  any  applied  study.  Eventu¬ 
ally,  as  the  derivative  is  subjected  to  more  and  more  complex  manipulations, 
it  becomes  crucial  to  know  precisely  how  differentiation  is  defined  and  how  it 
interacts  with  other  mathematical  operations. 

Although  physical  applications  are  not  explicitly  discussed,  we  will  encounter 
several  questions  of  a  more  abstract  quality  as  we  develop  the  theory.  Many  of 
these  are  concerned  with  the  relationship  between  differentiation  and  continuity. 
Are  continuous  functions  always  differentiable?  If  not,  how  nondifferentiable  can 
a  continuous  function  be?  Are  differentiable  functions  continuous?  Given  that 
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Figure  5.1:  Definition  of  g'(c). 


a  function  /  has  a  derivative  at  every  point  in  its  domain,  what  can  we  say 
about  the  function  /'?  Is  f  continuous?  How  accurately  can  we  describe  the 
set  of  all  possible  derivatives,  or  are  there  no  restrictions?  Put  another  way,  if 
we  are  given  an  arbitrary  function  g,  is  it  always  possible  to  find  a  differentiable 
function  /  such  that  f'  =  g,  or  are  there  some  properties  that  g  must  possess  for 
this  to  occur?  In  our  study  of  continuity,  we  saw  that  restricting  our  attention 
to  monotone  functions  had  a  significant  impact  on  the  answers  to  questions 
about  sets  of  discontinuity.  What  effect,  if  any,  does  this  same  restriction  have 
on  our  questions  about  potential  sets  of  nondifferentiable  points?  Some  of  these 
issues  are  harder  to  resolve  than  others,  and  some  remain  unanswered  in  any 
satisfactory  way. 

A  particularly  useful  class  of  examples  for  this  discussion  are  functions  of 
the  form 


gn(x ) 


xn  sin(l/x)  if  x  ^  0 
0  if  x  =  0. 


When  n  =  0,  we  have  seen  (Example  4.2.6)  that  the  oscillations  of  sin(l/x) 
prevent  go(x)  from  being  continuous  at  x  =  0.  When  n  =  1,  these  oscillations 
are  squeezed  between  \x\  and  —  |x|,  the  result  being  that  gi  is  continuous  at 
x  =  0  (Example  4.3.6).  Is  gi(0)  defined?  Using  the  preceding  definition,  we  get 


g[(  0)  =  lim 

x— ^0 


flip) 

X 


lim  sin(l/x), 


which,  as  we  now  know,  does  not  exist.  Thus,  g\  is  not  differentiable  at  x  —  0. 
On  the  other  hand,  the  same  calculation  shows  that  g 2  is  differentiable  at  zero. 
In  fact,  we  have 


lim  x  sin(l/x)  =  0. 

cc— t>0  7 


At  points  different  from  zero,  we  can  use  the  familiar  rules  of  differentiation 
(soon  to  be  justified)  to  conclude  that  #2  is  differentiable  everywhere  in  R  with 


—  cos(l/x)  +  2x  sin(l/x)  if  x  7^  0 
0  if  x  =  0. 
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Figure  5.2:  The  function  #2(x)  =  x2sin(l/x)  near  zero. 


But  now  consider 


limg'2(x). 

a:— >-0 


Because  the  cos(l/x)  term  is  not  preceded  by  a  factor  of  x,  we  must  conclude 
that  this  limit  does  not  exist  and  that,  consequently,  the  derivative  function 
is  not  continuous.  To  summarize,  the  function  g2(x)  is  continuous  and  differ¬ 
entiable  everywhere  on  R  (Fig.  5.2),  the  derivative  function  g2  is  thus  defined 
everywhere  on  R,  but  g'2  has  a  discontinuity  at  zero.  The  conclusion  is  that 
derivatives  need  not,  in  general,  be  continuous! 

The  discontinuity  in  g2  is  essential ,  meaning  linu^o  g'(x)  does  not  exist  as  a 
one-sided  limit.  But,  what  about  a  function  with  a  simple  jump  discontinuity? 
For  example,  does  there  exist  a  function  h  such  that 


—  1  if  x  <  0 
1  if  x  >  0. 


A  first  impression  may  bring  to  mind  the  absolute  value  function,  which  has 
slopes  of  —1  at  points  to  the  left  of  zero  and  slopes  of  1  to  the  right.  However,  the 
absolute  value  function  is  not  differentiable  at  zero.  We  are  seeking  a  function 
that  is  differentiable  everywhere,  including  the  point  zero,  where  we  are  insisting 
that  the  slope  of  the  graph  be  —1.  The  degree  of  difficulty  of  this  request  should 
start  to  become  apparent.  Without  sacrificing  differentiability  at  any  point,  we 
are  demanding  that  the  slopes  jump  from  —1  to  1  and  not  attain  any  value  in 
between. 

Although  we  have  seen  that  continuity  is  not  a  required  property  of  deriva¬ 
tives,  the  intermediate  value  property  will  prove  a  more  stubborn  quality  to 
ignore. 
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5.2  Derivatives  and  the  Intermediate 
Value  Property 


Although  the  definition  would  technically  make  sense  for  more  complicated 
domains,  all  of  the  interesting  results  about  the  relationship  between  a  func¬ 
tion  and  its  derivative  require  that  the  domain  of  the  given  function  be  an 
interval.  Thinking  geometrically  of  the  derivative  as  a  rate  of  change,  it  should 
not  be  too  surprising  that  we  would  want  to  confine  the  independent  variable 
to  move  about  a  connected  domain. 

The  theory  of  functional  limits  from  Section  4.2  is  all  that  is  needed  to  supply 
a  rigorous  definition  for  the  derivative. 

Definition  5.2.1  (Differentiability).  Let  g  :  A  R  be  a  function  defined 
on  an  interval  A.  Given  c  E  A,  the  derivative  of  g  at  c  is  defined  by 


lim 

x^-c 


gjx)  -  g(c) 

X  —  C 


provided  this  limit  exists.  In  this  case  we  say  g  is  differentiable  at  c.  If  g'  exists 
for  all  points  c  E  A,  we  say  that  g  is  differentiable  on  A. 

Example  5.2.2.  (i)  Consider  f(x)  =  xn,  where  n  E  N,  and  let  c  be  any 

arbitrary  point  in  R.  Using  the  algebraic  identity 


xn  —  Cn  =  (x  —  C)(xn  1  +  cxn  2-\-c2xn  3  +  •  •  •  +  cn  1), 


we  can  calculate  the  familiar  formula 


/'(c)  =  lim 


_  pfi 

-  =  lim  (xn_1  +  cxn~ 2  +  c2xn~ 3  +  •  •  •  +  cn_1) 

x^tc  X  —  C  x^-c 

=  c”"1  +  c”_1  H - b  c”_1  =  nc”"1. 


(ii)  If  g(x)  =  x  | .  then  attempting  to  compute  the  derivative  at  c 
the  limit 


g\ 0)  =  lim 

x^-0 


X 

X 


0  produces 


which  is  +1  or  —1  depending  on  whether  x  approaches  zero  from  the  right 
or  left.  Consequently,  this  limit  does  not  exist,  and  we  conclude  that  g  is 
not  differentiable  at  zero. 

Example  5.2.2  (ii)  is  a  reminder  that  continuity  of  g  does  not  imply  that  g 
is  necessarily  differentiable.  On  the  other  hand,  if  g  is  differentiable  at  a  point, 
then  it  is  true  that  g  must  be  continuous  at  this  point. 

Theorem  5.2.3.  If  g  :  A  R  is  differentiable  at  a  point  c  E  A,  then  g  is 
continuous  at  c  as  well. 
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Proof.  We  are  assuming  that 


g\c )  =  lim 

x— >c 


9{x)  -g{c) 
x  —  c 


exists,  and  we  want  to  prove  that  lim X^cg(x)  =  g(c).  But  notice  that  the 
Algebraic  Limit  Theorem  for  functional  limits  allows  us  to  write 


lim  (g(x)  —  g(c))  =  lim 


g(x)  -  g(c) 


X^-C 


X^rC 


x  —  c 


(x  —  c)  =  g\c)  -0  =  0. 


It  follows  that  li mx^.cg(x)  =  g(c). 


□ 


Combinations  of  Differentiable  Functions 


The  Algebraic  Limit  Theorem  (Theorem  2.3.3)  led  easily  to  the  conclusion 
that  algebraic  combinations  of  continuous  functions  are  continuous.  With  only 
slightly  more  work,  we  arrive  at  a  similar  conclusion  for  sums,  products,  and 
quotients  of  differentiable  functions. 


Theorem  5.2.4  (Algebraic  Differentiability  Theorem).  Let  f  and  g  be 

functions  defined  on  an  interval  A,  and  assume  both  are  differentiable  at  some 
point  c  e  A.  Then , 


(i)  (f  +  gy(c)  =  f'(c)  +  g'(c), 

(ii)  (kf)'(c)  =  kf'(c),  for  all  k  e  R, 

(iii)  ( fg)'(c )  =  f'(c)g(c)  +  f(c)g'(c),  and 

(iv)  (/ / g)’  (c)  =  g(-c^  ([ftp9  ^  >  provided  that  g{c)  y  0. 

Proof.  Statements  (i)  and  (ii)  are  left  as  exercises.  To  prove  (iii),  we  rewrite  the 
difference  quotient  as 


(/gKg)  -  (/g)(c) 

x  —  c 


f{x)g{x)  -  f(x)g(c)  +  f(x)g{c)  -  f(c)g(c) 


x  —  c 


f(x) 


~g(x)  -  g(c)' 

V) 

H 

_ i 

_ i 

L 

h 

f(x)  -  /(c) 


x  —  c 


Because  /  is  differentiable  at  c,  it  is  continuous  there  and  thus  lim x^cf{pc)  = 
/(c).  This  fact,  together  with  the  functional-limit  version  of  the  Algebraic  Limit 
Theorem  (Theorem  4.2.4),  justifies  the  conclusion 


lim  (fg)(x)  -  ( fg)(c ) 

x^c  X  —  C 


f(c)g'(c )  +f(c)g(c). 


A  similar  proof  of  (iv)  is  possible,  or  we  can  use  an  argument  based  on  the 
next  result.  Each  of  these  options  is  discussed  in  Exercise  5.2.3.  □ 
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The  composition  of  two  differentiable  functions  also  fortunately  results  in  an¬ 
other  differentiable  function.  This  fact  is  referred  to  as  the  Chain  Rule.  To  dis¬ 
cover  the  proper  formula  for  the  derivative  of  the  composition  g  o  f ,  we  can 
write 


(g°  f)'(c)  =  lim 

X^rC 


gC/M)  -g(/(c)) 

x  —  c 


j.  g(f(x))  -  ff(/(c))  _  f{x)  -  /(c) 

x-»c  /(x)  —  /(c)  X  —  C 

g'ific ))  •  /'(c). 


With  a  little  polish,  this  string  of  equations  could  qualify  as  a  proof  except  for  the 
pesky  fact  that  the  f(x)  —  /(c)  expression  causes  problems  in  the  denominator  if 
fix)  =  /(c)  for  x  values  in  arbitrarily  small  neighborhoods  of  c.  (The  function 
g2{%)  discussed  in  Section  5.1  exhibits  this  behavior  near  c  =  0.)  The  upcoming 
proof  of  the  Chain  Rule  manages  to  finesse  this  problem  but  in  content  is  essen¬ 
tially  the  argument  just  given.  Another  approach  is  sketched  in  Exercise  5.2.4. 


Theorem  5.2.5  (Chain  Rule).  Let  f  :  A  R  and  g  :  B  R  satisfy 
f(A)  C  B  so  that  the  composition  g  o  f  is  defined.  If  f  is  differentiable  at 
c  G  A  and  if  g  is  differentiable  at  /(c)  E  B ,  then  g  o  f  is  differentiable  at  c  with 
(gofy(c)=  g'(fic)) -fie). 

Proof.  Because  g  is  differentiable  at  /(c),  we  know  that 


g'ific )) 


lim 

y->fO 


gjy)  -gjfiO) 

y  -  /(c) 


Another  way  to  assert  this  same  fact  is  to  let  d(y)  be  the  difference  quotient 

in  av) = 9M  -  vr  ■ 

y  -  /(c) 

and  observe  that  lim^j^)  d(y)  =  g'(f(c)).  At  the  moment,  d(y)  is  not  defined 
when  y  =  /(c),  but  it  should  seem  natural  to  declare  that  d(f(c))  =  g'(f(c )), 
so  that  d  is  continuous  at  /(c). 

Now,  we  come  to  the  finesse.  Equation  (1)  can  be  rewritten  as 


(2)  giy)-gific))=diy)iy-fic)). 


Observe  that  this  equation  holds  for  all  y  E  B  including  y  =  /(c).  Thus,  we 
are  free  to  substitute  y  =  f{t)  for  any  arbitrary  t  E  A.  If  t  ^  c,  we  can  divide 
equation  (2)  by  ( t  —  c)  to  get 

gif  it))  -gjfjc))  _  d(f(t))ifjt)  -  fjc)) 

t-c  KJK>>  t-c 


for  all  f  /  c.  Finally,  taking  the  limit  as  t  c  and  applying  the  Algebraic  Limit 
Theorem  together  with  Theorem  4.3.9  yields  the  desired  formula.  □ 
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Figure  5.3:  The  Interior  Extremum  Theorem. 


Darboux’s  Theorem 

One  conclusion  from  this  chapter’s  introduction  is  that  although  continuity  is 
necessary  for  the  derivative  to  exist,  it  is  not  the  case  that  the  derivative  function 
itself  will  always  be  continuous.  Our  specific  example  was  g2{x)  =  x 2  sin(l/x), 
where  we  set  ^(0)  =  0.  By  tinkering  with  the  exponent  of  the  leading  x 2  factor, 
it  is  possible  to  construct  examples  of  differentiable  functions  with  derivatives 
that  are  unbounded,  or  twice-differentiable  functions  that  have  discontinuous 
second  derivatives  (Exercise  5.2.7).  The  underlying  principle  in  all  of  these 
examples  is  that  by  controlling  the  size  of  the  oscillations  of  the  original  function, 
we  can  make  the  corresponding  oscillations  of  the  slopes  volatile  enough  to 
prevent  the  existence  of  the  relevant  limits. 

It  is  significant  that  for  this  class  of  examples,  the  discontinuities  that  arise 
are  never  simple  jump  discontinuities.  (A  precise  definition  of  “jump  discon¬ 
tinuity”  is  presented  in  Section  4.6.)  We  are  now  ready  to  confirm  our  earlier 
suspicions  that  although  derivatives  do  not  in  general  have  to  be  continuous, 
they  do  possess  the  intermediate  value  property.  (See  Definition  4.5.3.)  This 
surprising  observation  is  a  fairly  straightforward  corollary  to  the  more  obvious 
observation  that  differentiable  functions  attain  maximums  and  minimums  only 
at  points  where  the  derivative  is  equal  to  zero  (Fig.  5.3). 

Theorem  5.2.6  (Interior  Extremum  Theorem).  Let  f  be  differentiable  on 
an  open  interval  (a,b).  If  f  attains  a  maximum  value  at  some  point  c  E  (a,  b) 
(i.e.,  f(c)  >  f(x)  for  all  x  E  (a,  b) ),  then  f(c )  =  0.  The  same  is  true  if  /(c)  is 
a  minimum  value. 


Proof.  Because  c  is  in  the  open  interval  (a,  6),  we  can  construct  two  sequences 
(xn)  and  (yn),  which  converge  to  c  and  satisfy  xn  <  c  <  yn  for  all  n  E  N.  The 
fact  that  /(c)  is  a  maximum  implies  that  f(yn )  —  /(c)  <0  for  all  n,  and  thus 


lim 

n—Y  oo 


f(Vn)  ~  /(c) 


<  0 


Vn~C 
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by  the  Order  Limit  Theorem  (Theorem  2.3.4).  In  a  similar  way, 


/On)  ~  /(c)  >  Q 

Xn  ~  C  ~ 

for  each  xn  because  both  numerator  and  denominator  are  negative.  This  implies 
that 


lim 

n— >■  oo 


f{xn)  ~  f(c ) 

Xn-  C 


>  0 


and  therefore  /'(c)  =  0,  as  desired. 


□ 


The  Interior  Extremum  Theorem  is  the  fundamental  fact  behind  the  use  of 
the  derivative  as  a  tool  for  solving  applied  optimization  problems.  This  idea, 
discovered  and  exploited  by  Pierre  de  Fermat,  is  as  old  as  the  derivative  itself. 
In  a  sense,  finding  maximums  and  minimums  is  arguably  why  Fermat  invented 
his  method  of  finding  slopes  of  tangent  lines.  It  was  200  years  later  that  the 
French  mathematician  Gaston  Darboux  (1842-1917)  pointed  out  that  Fermat’s 
method  of  finding  maximums  and  minimums  carries  with  it  the  implication  that 
if  a  derivative  function  attains  two  distinct  values  /'(a )  and  /'(&),  then  it  must 
also  attain  every  value  in  between. 


Theorem  5.2.7  (Darboux’s  Theorem).  If  f  is  differentiable  on  an  interval 
a,  b\,  and  if  a  satisfies  /'(a)  <  a  <  f'(b)  (or  f'(a )  >  a  >  /'(&)),  then  there 
exists  a  point  c  E  (a,  b)  where  /'(c)  =  a. 

Proof.  We  first  simplify  matters  by  defining  a  new  function  g(x)  =  f(x)  —  ax 
on  [a,  b\.  Notice  that  g  is  differentiable  on  [a,  b]  with  g'(x)  =  f'(x)  —  a.  In  terms 
of  g,  our  hypothesis  states  that  g'(a)  <  0  <  g'(6),  and  we  hope  to  show  that 
g'(c)  =  0  for  some  c  E  (a,  b). 

The  remainder  of  the  argument  is  outlined  in  Exercise  5.2.11.  □ 


Exercises 

Exercise  5.2.1.  Supply  proofs  for  parts  (i)  and  (ii)  of  Theorem  5.2.4. 

Exercise  5.2.2.  Exactly  one  of  the  following  requests  is  impossible.  Decide 
which  it  is,  and  provide  examples  for  the  other  three.  In  each  case,  let’s  assume 
the  functions  are  defined  on  all  of  R. 

(a)  Functions  /  and  g  not  differentiable  at  zero  but  where  fg  is  differentiable 
at  zero. 

(b)  A  function  /  not  differentiable  at  zero  and  a  function  g  differentiable  at 
zero  where  fg  is  differentiable  at  zero. 

(c)  A  function  /  not  differentiable  at  zero  and  a  function  g  differentiable  at 
zero  where  /  +  g  is  differentiable  at  zero. 

(d)  A  function  /  differentiable  at  zero  but  not  differentiable  at  any  other  point. 
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Exercise  5.2.3.  (a)  Use  Definition  5.2.1  to  produce  the  proper  formula  for 

the  derivative  of  h{x)  =  1/x. 

(b)  Combine  the  result  in  part  (a)  with  the  Chain  Rule  (Theorem  5.2.5)  to 
supply  a  proof  for  part  (iv)  of  Theorem  5.2.4. 

(c)  Supply  a  direct  proof  of  Theorem  5.2.4  (iv)  by  algebraically  manipulat¬ 
ing  the  difference  quotient  for  (f/g)  in  a  style  similar  to  the  proof  of 
Theorem  5.2.4  (iii). 

Exercise  5.2.4.  Follow  these  steps  to  provide  a  slightly  modified  proof  of  the 
Chain  Rule. 

(a)  Show  that  a  function  h  :  A  R  is  differentiable  at  a  E  A  if  and  only  if 

there  exists  a  function  l  :  A  R  which  is  continuous  at  a  and  satisfies 

h{x)  —  h{a )  =  l {x) {x  —  a)  for  all  x  E  A. 


(b)  Use  this  criterion  for  differentiability  (in  both  directions)  to  prove  Theorem 
5.2.5. 


Exercise  5.2.5.  Let  fa{x) 


xa  if  x  >  0 

0  if  x  <  0. 


(a)  For  which  values  of  a  is  /  continuous  at  zero? 

(b)  For  which  values  of  a  is  /  differentiable  at  zero?  In  this  case,  is  the 
derivative  function  continuous? 


(c)  For  which  values  of  a  is  /  twice-differentiable? 

Exercise  5.2.6.  Let  g  be  defined  on  an  interval  A ,  and  let  cG  A 
(a)  Explain  why  g'{c )  in  Definition  5.2.1  could  have  been  given  by 


g’(c)  =  lim 

h—t  0 


g{c+h)  -  g(c) 
h 


(b)  Assume  A  is  open.  If  g  is  differentiable  at  c  E  A,  show 


g'(c)  =  lim 

h—t  0 


g(c  +  h)  —  g(c  —  h) 
2  h 


Exercise  5.2.7.  Let 


9a(x) 


xa  sin(l/x)  if  x  ^  0 
0  if  x  =  0. 


Find  a  particular  (potentially  noninteger)  value  for  a  so  that 
(a)  ga  is  differentiable  on  R  but  such  that  g'a  is  unbounded  on  [0, 1]. 
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(b)  ga  is  differentiable  on  R  with  g'a  continuous  but  not  differentiable  at  zero. 

(c)  ga  is  differentiable  on  R  and  g'a  is  differentiable  on  R,  but  such  that  g”  is 
not  continuous  at  zero. 


Exercise  5.2.8.  Review  the  definition  of  uniform  continuity  (Definition  4.4.4). 
Given  a  differentiable  function  f  :  A  R,  let’s  say  that  /  is  uniformly  differ¬ 
entiable  on  A  if,  given  e  >  0  there  exists  a  <5  >  0  such  that 


f(x)  -  f(y) 


x-y 


-  f{y) 


<  e  whenever  0  < 


x-y 


<  S. 


(a)  Is  f(x)  =  x 2  uniformly  differentiable  on  R?  How  about  g(x)  =  or3? 

(b)  Show  that  if  a  function  is  uniformly  differentiable  on  an  interval  A,  then 
the  derivative  must  be  continuous  on  A. 


Is  there  a  theorem  analogous  to  Theorem  4.4.7  for  differentiation?  Are 
functions  that  are  differentiable  on  a  closed  interval  [a,  b }  necessarily  uni¬ 
formly  differentiable? 


Exercise  5.2.9.  Decide  whether  each  conjecture  is  true  or  false.  Provide  an 
argument  for  those  that  are  true  and  a  counterexample  for  each  one  that  is  false. 

(a)  If  f  exists  on  an  interval  and  is  not  constant,  then  f  must  take  on  some 
irrational  values. 

(b)  If  f  exists  on  an  open  interval  and  there  is  some  point  c  where  /  '(c)  >  0, 
then  there  exists  a  5-neighborhood  Vs(c)  around  c  in  which  f'(x)  >  0  for 
all  x  G  Vs(c). 

(c)  If  /  is  differentiable  on  an  interval  containing  zero  and  if  limx^o  f'(x)  =  A, 
then  it  must  be  that  L  =  f'( 0). 

Exercise  5.2.10.  Recall  that  a  function  /  :  (a,  b)  R  is  increasing  on  (a,  b) 
if  f(x)  <  f(y)  whenever  x  <  y  in  (a,  b).  A  familiar  mantra  from  calculus  is 
that  a  differentiable  function  is  increasing  if  its  derivative  is  positive,  but  this 
statement  requires  some  sharpening  in  order  to  be  completely  accurate. 

Show  that  the  function 

,  x  f  x/2  +  x2  sin(l/x)  if  x  A  0 

g(x)  =  <  '  -C  n 

* v  J  0  if  X  =  0 


is  differentiable  on  R  and  satisfies  g'( 0)  >  0.  Now,  prove  that  g  is  not  increasing 
over  any  open  interval  containing  0. 

In  the  next  section  we  will  see  that  /  is  indeed  increasing  on  (a,  b)  if  and 
only  if  f'(x)  >  0  for  all  x  G  (a,  b). 


Exercise  5.2.11.  Assume  that  g  is  differentiable  on  [a,  b]  and  satisfies  g'(a)  < 
0  <  g'(b). 


(a)  Show  that  there  exists  a  point  x  G  (a,  b)  where  g(a)  >  g(x),  and  a  point 
y  G  (a,  b)  where  g(y)  <  9(b) ■ 
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Figure  5.4:  The  Mean  Value  Theorem. 


(b)  Now  complete  the  proof  of  Darboux’s  Theorem  started  earlier. 


Exercise  5.2.12  (Inverse  functions).  If  /  :  [a,  b\  -A  R  is  one-to-one,  then 
there  exists  an  inverse  function  /-1  defined  on  the  range  of  /  given  by  f~1(y)  = 
x  where  y  =  f(x).  In  Exercise  4.5.8  we  saw  that  if  /  is  continuous  on  [a,  6], 
then  /-1  is  continuous  on  its  domain.  Let’s  add  the  assumption  that  /  is 
differentiable  on  [a,  b }  with  f'(x)  ^  0  for  all  x  E  [a,  b\.  Show  /-1  is  differentiable 
with 

{f~1)'(y)  =  77V  where  y  =  f(x). 

J  (x) 


5.3  The  Mean  Value  Theorems 

The  Mean  Value  Theorem  (Fig.  5.4)  makes  the  geometrically  plausible  assertion 
that  a  differentiable  function  /  on  an  interval  [a,  b]  will,  at  some  point,  attain  a 
slope  equal  to  the  slope  of  the  line  through  the  endpoints  (a,  /  (a))  and  (6,  /(&)). 
More  tersely  put, 

/0)  -  f(g) 

b  —  a 

for  at  least  one  point  c  E  (a,  b). 

On  the  surface,  there  does  not  appear  to  be  anything  especially  remarkable 
about  this  observation.  Its  validity  appears  undeniable — much  like  the  Inter¬ 
mediate  Value  Theorem  for  continuous  functions — and  its  proof  is  rather  short. 
The  ease  of  the  proof,  however,  is  misleading,  as  it  is  built  on  top  of  some 
hard-fought  accomplishments  from  the  study  of  limits  and  continuity.  In  this 
regard,  the  Mean  Value  Theorem  is  a  kind  of  reward  for  a  job  well  done.  As  we 
will  see,  it  is  a  prize  of  exceptional  value.  Although  the  result  itself  is  geomet¬ 
rically  unsurprising,  the  Mean  Value  Theorem  is  the  cornerstone  of  the  proof 
for  almost  every  major  theorem  pertaining  to  differentiation.  We  will  use  it  to 
prove  L’Hospital’s  rules  regarding  limits  of  quotients  of  differentiable  functions. 
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Figure  5.5:  Rolle’s  Theorem. 


A  rigorous  analysis  of  how  infinite  series  of  functions  behave  when  differentiated 
requires  the  Mean  Value  Theorem  (Theorem  6.4.3),  and  it  is  the  crucial  step  in 
the  proof  of  the  Fundamental  Theorem  of  Calculus  (Theorem  7.5.1).  It  is  also 
the  fundamental  concept  underlying  Lagrange’s  Remainder  Theorem  (Theorem 
6.6.3)  which  approximates  the  error  between  a  Taylor  polynomial  and  the  func¬ 
tion  that  generates  it. 

The  Mean  Value  Theorem  can  be  stated  in  various  degrees  of  generality, 
each  one  important  enough  to  be  given  its  own  special  designation.  Recall  that 
the  Extreme  Value  Theorem  (Theorem  4.4.2)  states  that  continuous  functions 
on  compact  sets  always  attain  maximum  and  minimum  values.  Combining  this 
observation  with  the  Interior  Extremum  Theorem  for  differentiable  functions 
(Theorem  5.2.6)  yields  a  special  case  of  the  Mean  Value  Theorem  first  noted  by 
the  mathematician  Michel  Rolle  (1652-1719)  (Fig.  5.5). 


Theorem  5.3.1  (Rolle’s  Theorem).  Let  f  :  [a,  b]  -o  R  be  continuous  on  [a,  b } 
and  differentiable  on  (a,  b).  If  /(a)  =  f(b),  then  there  exists  a  point  c  E  (a,  b) 
where  /'(c)  =  0. 


Proof.  Because  /  is  continuous  on  a  compact  set,  /  attains  a  maximum  and  a 
minimum.  If  both  the  maximum  and  minimum  occur  at  the  endpoints,  then  / 
is  necessarily  a  constant  function  and  f'(x)  =  0  on  all  of  (a,  b).  In  this  case,  we 
can  choose  c  to  be  any  point  we  like.  On  the  other  hand,  if  either  the  maximum 
or  minimum  occurs  at  some  point  c  in  the  interior  (a,  6),  then  it  follows  from 
the  Interior  Extremum  Theorem  (Theorem  5.2.6)  that  /'(c)  =0.  □ 


Theorem  5.3.2  (Mean  Value  Theorem).  Iff:  [a,6]->R  is  continuous  on 
a,  b\  and  differentiable  on  (a,  b),  then  there  exists  a  point  c  E  (a,  b)  where 

f{b)  -  f(a ) 
b  —  a 

Proof.  Notice  that  the  Mean  Value  Theorem  reduces  to  Rolle’s  Theorem  in  the 
case  where  /(a)  =  f(b).  The  strategy  of  the  proof  is  to  reduce  the  more  general 
statement  to  this  special  case. 
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The  equation  of  the  line  through  (a,  /(a))  and  (5,  f(b))  is 

f(b )  -  /(«) 


y 


b  —  a 


(x  —  a)  +  /(a). 


We  want  to  consider  the  difference  between  this  line  and  the  function  f(x).  To 
this  end,  let 


d{x)  =  f(x) 


fib)  ~  f(a) 


a 


(x  —  a)  +  /(a) 


and  observe  that  d  is  continuous  on  [a,  6],  differentiable  on  (a,  6),  and  satisfies 
d(a)  =  0  =  d(b).  Thus,  by  Rohe’s  Theorem,  there  exists  a  point  c  E  (a,  6)  where 
d'(c)  =  0.  Because 

,//  ,  w/  ,  /(&)  -  /(a) 

d(x)  =  f  (x)  -  - 


we  get 


o  =  /'(c) 


b  —  a 

m  -  /(a) 

b  —  a 


which  completes  the  proof. 


□ 


The  point  has  been  made  that  the  Mean  Value  Theorem  manages  to  find  its 
way  into  nearly  every  proof  of  any  statement  related  to  the  geometrical  nature 
of  the  derivative.  As  a  simple  example,  if  /  is  a  constant  function  f(x)  =  k  on 
some  interval  A ,  then  a  straightforward  calculation  of  f'  using  Definition  5.2.1 
shows  that  f{x)  =  0  for  all  x  E  A.  But  how  do  we  prove  the  converse  statement? 
If  we  know  that  a  differentiable  function  g  satisfies  g\x)  =  0  everywhere  on  A , 
our  intuition  suggests  that  we  should  be  able  to  prove  g(x)  is  constant.  It  is  the 
Mean  Value  Theorem  that  provides  us  with  a  way  to  articulate  rigorously  what 
seems  geometrically  valid. 


Corollary  5.3.3.  If  g  :  A  R  is  differentiable  on  an  interval  A  and  satisfies 
g'(x)  =  0  for  all  x  E  A,  then  g(x)  =  k  for  some  constant  k  E  R. 


Proof.  Take  x,y  E  A  and  assume  x  <  y.  Applying  the  Mean  Value  Theorem  to 
g  on  the  interval  [x,  y\,  we  see  that 


g\c)  =  9T1  ~ 
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for  some  c  E  A.  Now,  g'(c)  =  0,  so  we  conclude  that  g(y)  =  g(x).  Set  k  equal 
to  this  common  value.  Because  x  and  y  are  arbitrary,  it  follows  that  g{x)  =  k 
for  all  x  E  A.  □ 

Corollary  5.3.4.  If  f  and  g  are  differentiable  functions  on  an  interval  A  and 
satisfy  f'(x)  =  g'(x)  for  all  x  E  A,  then  f(pc)  =  g(x)  +  k  for  some  constant 
k  E  R. 

Proof  Let  h(x)  =  f(x)  —  g(x)  and  apply  Corollary  5.3.3  to  the  differentiable 
function  h.  □ 

The  Mean  Value  Theorem  has  a  more  general  form  due  to  Cauchy.  It  is  this 
generalized  version  of  the  theorem  that  is  needed  to  analyze  L ’Hospital’s  rules 
and  Lagrange’s  Remainder  Theorem. 

Theorem  5.3.5  (Generalized  Mean  Value  Theorem).  If  f  and  g  are  con¬ 
tinuous  on  the  closed  interval  [a,  b\  and  differentiable  on  the  open  interval  (a,  b), 
then  there  exists  a  point  c  E  (a,  b)  where 

if(b)  -  f(a)W(c)  =  [ g(b )  -  g(a)]f'(c). 

If  g'  is  never  zero  on  (a,  b),  then  the  conclusion  can  be  stated  as 

/'(c)  =  f(b)  ~  /(a) 

g'(c)  g(b)  -  g(a ) ' 

Proof.  This  result  follows  by  applying  the  Mean  Value  Theorem  to  the  func¬ 
tion  h(x)  =  [f(b)  —  f(a)]g{x)  —  [g(b)  —  g(a)]f(x).  The  details  are  requested  in 
Exercise  5.3.5.  □ 


L’Hospital’s  Rules 


The  Algebraic  Limit  Theorem  asserts  that  when  taking  a  limit  of  a  quotient  of 
functions  we  can  write 


lim 

x^-c 


fiV 

g(x) 


lim  f(x) 

x^-c 

lim  g(x)  ’ 

X^rC 


provided  that  each  individual  limit  exists  and  lim X^cg(x)  is  not  zero.  If  the 
denominator  does  converge  to  zero  and  the  numerator  has  a  nonzero  limit, 
then  it  is  not  difficult  to  argue  that  the  quotient  f(x)/g(x)  grows  in  absolute 
value  without  bound  as  x  approaches  c.  L ’Hospital’s  Rules  are  named  for  the 
Marquis  de  L’Hospital  (1661-1704),  who  learned  the  results  from  his  tutor, 
Johann  Bernoulli  (1667-1748),  and  published  them  in  1696  in  what  is  regarded 
as  the  first  calculus  text.  Stated  in  different  levels  of  generality,  they  are  an 
effective  tool  for  handling  the  indeterminant  cases  when  either  numerator  and 
denominator  both  tend  to  zero  or  both  tend  simultaneously  to  infinity. 
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Theorem  5.3.6  (L’Hospital’s  Rule:  0/0  case).  Let  f  and  g  be  continuous 
on  an  interval  containing  a,  and  assume  f  and  g  are  differentiable  on  this 
interval  with  the  possible  exception  of  the  point  a.  If  f  (a)  =  g(a)  =  0  and 
g'(x)  7^  0  for  all  x  a,  then 


lim  ^-7—7  =  L  implies 
x^a  g'(x) 


Proof.  This  argument  follows  from  a  straightforward  application  of  the  Gener¬ 
alized  Mean  Value  Theorem.  It  is  requested  as  Exercise  5.3.11.  □ 


L’Hospital’s  Rule  remains  true  if  we  replace  the  assumption  that  /(a)  = 
g(a)  =  0  with  the  hypothesis  that  lim X^ag(x)  =  00.  To  this  point  we  have  not 
been  explicit  about  what  it  means  to  say  that  a  limit  equals  00.  The  logical 
structure  of  such  a  definition  is  precisely  the  same  as  it  is  for  finite  functional 
limits.  The  difference  is  that  rather  than  trying  to  force  the  function  to  take 
on  values  in  some  small  e-neighborhood  around  a  proposed  limit,  we  must  show 
that  g(x)  eventually  exceeds  any  proposed  upper  bound.  The  arbitrarily  small 
e  >  0  is  replaced  by  an  arbitrarily  large  M  >  0. 


Definition  5.3.7.  Given  g  :  A  R  and  a  limit  point  c  of  A ,  we  say  that 
Ihrn^c  g(x)  =  00  if,  for  every  M  >  0,  there  exists  a  5  >  0  such  that  whenever 
0  <  \x  —  c\  <  5  it  follows  that  g(x)  >  M. 

We  can  define  lim X^cg(x)  =  —00  in  a  similar  way. 


The  following  version  of  L’Hospital’s  Rule  is  typically  referred  to  as  the  00/00 
case  even  though  the  hypothesis  only  requires  that  the  function  in  the  denomi¬ 
nator  tend  to  infinity.  To  simplify  the  notation  of  the  proof,  we  state  the  result 
using  a  one-sided  limit. 


Theorem  5.3.8  (L’Hospital’s  Rule:  00/00  case).  Assume  f  and  g  are 

differentiable  on  (a,  b )  and  that  gf(x)  0  for  all  x  E  (a,  b).  If  \imx^a  g(x)  =  00 
(or  —00),  then 


lim  ^-7—7  =  L  implies  lim  ^7—7  =  L. 

x^a  g  (x)  x^a  g(x) 


Proof.  Let  e  >  0.  Because  lim^^a  ^ ,\x\  =  L,  there  exists  a  Si  >  0  such  that 

9  \X) 


fix) 

g'{x) 


e 


for  all  a  <  x  <  a  +  S\.  For  convenience  of  notation,  let  t  =  a  +  Si  and  note  that 
t  is  fixed  for  the  remainder  of  the  argument. 

Our  functions  are  not  defined  at  a,  but  for  any  x  E  (a,  t)  we  can  apply  the 
Generalized  Mean  Value  Theorem  on  the  interval  [x,  t\  to  get 


fix)  -  fit)  =  no 
g 0)  -  git)  g'ic) 
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for  some  c  E  (x,  t).  Our  choice  of  t  then  implies 


<i) 


L-i<fj£lzm<L  +  i 

2  g{x)  -  g(t)  2 


for  all  x  in  (a,  t). 

In  an  effort  to  isolate  the  fraction  ,  the  strategy  is  to  multiply  inequality 

(1)  by  (g(x)  —  g(t))/g(x).  We  need  to  be  sure,  however,  that  this  quantity  is 
positive,  which  amounts  to  insisting  that  1  >  g{t)/g{x).  Because  t  is  fixed  and 
lim^^a  g{x)  =  oo,  we  can  choose  82  >  0  so  that  g(x)  >  g(t)  for  all  a  <  x  <  (1+82- 
Carrying  out  the  desired  multiplication  results  in 


L-\ 


,  _  fix)  -  fjt)  (  e 

g(x)J  g(x)  \  2 


1 


gif)  \ 

g(x)J’ 


which  after  some  algebraic  manipulations  yields 


L - h 

2 


■Lg(t)  +  §g(t)  +  f(t)  ^  f(x)  ^  e 

g(x)  g(x)  +  2  + 


fgf)  - 1  gjt)  +  fit ) 

g(x) 


Again,  let’s  remind  ourselves  that  t  is  fixed  and  that  lim X^ag(x)  =  00.  Thus, 
we  can  choose  a  £3  such  that  a  <  x  <  a  +  £3  implies  that  g(x)  is  large  enough 
to  ensure  that  both 


—Lgjt)  +  f  gjt)  +  f(t) 
g(x) 


and 


-Lgjt)  -  f g(t)  +  f(t) 

g(x) 


are  less  than  e/2  in  absolute  value.  Putting  this  all  together  and  choosing 
5  =  min{(5i,  82,  ^3}  guarantees  that 


fix) 

gix) 


-  L 


<  e 


for  all  a  <  x  <  a  +  8. 


□ 


Exercises 

Exercise  5.3.1.  Recall  from  Exercise  4.4.9  that  a  function  /  :  A 
Lipschitz  on  A  if  there  exists  an  M  >  0  such  that 


R  is 


f(x)  -  f(y ) 


x-y 


<  M 


for  all  x  7^  y  in  A. 


(a)  Show  that  if  /  is  differentiable  on  a  closed  interval  [a,  b }  and  if  f'  is  con¬ 
tinuous  on  [a,  6],  then  /  is  Lipschitz  on  [a,  b\. 

(b)  Review  the  definition  of  a  contractive  function  in  Exercise  4.3.11.  If  we 
add  the  assumption  that  \f'(x)\  <  1  on  [a,  6],  does  it  follow  that  /  is 
contractive  on  this  set? 
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Exercise  5.3.2.  Let  /  be  differentiable  on  an  interval  A.  If  f'(x )  ^  0  on  A, 
show  that  /  is  one-to-one  on  A.  Provide  an  example  to  show  that  the  converse 
statement  need  not  be  true. 


Exercise  5.3.3.  Let  h  be  a  differentiable  function  defined  on  the  interval  [0,  3], 
and  assume  that  h{ 0)  =  1,  h{  1)  =  2,  and  h{ 3)  =  2. 

(a)  Argue  that  there  exists  a  point  d  E  [0,3]  where  h(d)  =  d. 

(b)  Argue  that  at  some  point  c  we  have  h'(c)  =  1/3. 

(c)  Argue  that  h'(x)  =  1/4  at  some  point  in  the  domain. 

Exercise  5.3.4.  Let  /  be  differentiable  on  an  interval  A  containing  zero,  and 
assume  (xn)  is  a  sequence  in  A  with  (xn)  — >  0  and  xn  0. 

(a)  If  f(xn)  =  0  for  all  nGiV,  show  /( 0)  =  0  and  /'( 0)  =  0. 

(b)  Add  the  assumption  that  /  is  twice-differentiable  at  zero  and  show  that 
/"( 0)  =  0  as  well. 

Exercise  5.3.5.  (a)  Supply  the  details  for  the  proof  of  Cauchy’s  Generalized 

Mean  Value  Theorem  (Theorem  5.3.5). 


(b)  Give  a  graphical  interpretation  of  the  Generalized  Mean  Value  Theorem 
analogous  to  the  one  given  for  the  Mean  Value  Theorem  at  the  beginning 
of  Section  5.3.  (Consider  /  and  g  as  parametric  equations  for  a  curve.) 


Exercise  5.3.6.  (a)  Let  g  :  [0,a]  — )►  R  be  differentiable,  g(0)  =  0,  and 

g'(x) |  <  M  for  all  x  E  [0,a].  Show  \g(x)\  <  Mx  for  all  x  E  [0,a  . 

(b)  Let  h  :  [0,a]  R  be  twice  differentiable,  h'( 0)  =  h( 0)  =  0  and  \h"(x)\  < 

M  for  all  x  E  [0,a].  Show  \h(x)\  <  Mx2 /2  for  all  x  E  [0,a]. 

(c)  Conjecture  and  prove  an  analogous  result  for  a  function  that  is  differen¬ 
tiable  three  times  on  [0,a]. 


Exercise  5.3.7.  A  fixed  point  of  a  function  /  is  a  value  x  where  f{pc)  =  x. 
Show  that  if  /  is  differentiable  on  an  interval  with  f[x )  7^  1,  then  /  can  have 
at  most  one  fixed  point. 


Exercise  5.3.8.  Assume  /  is  continuous  on  an  interval  containing  zero  and 
differentiable  for  all  x  0.  If  lina^o  f'(x)  =  A,  show  f'( 0)  exists  and  equals  L. 

Exercise  5.3.9.  Assume  /  and  g  are  as  described  in  Theorem  5.3.6,  but  now 
add  the  assumption  that  /  and  g  are  differentiable  at  a,  and  f  and  g'  are 
continuous  at  a  with  g'(a)  0.  Find  a  short  proof  for  the  0/0  case  of  L ’Hospital’s 

Rule  under  this  stronger  hypothesis. 
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Exercise  5.3.10.  Let  f(x)  =  xsin(l/x4)e_1/x2  and  g(x)  =  e_1/x2.  Using  the 
familiar  properties  of  these  functions,  compute  the  limit  as  x  approaches  zero  of 
/(#),  g(x),  f(x)/g(x),  and  f\x)/g'(x).  Explain  why  the  results  are  surprising 
but  not  in  conflict  with  the  content  of  Theorem  5.3.6. 1 


Exercise  5.3.11.  (a)  Use  the  Generalized  Mean  Value  Theorem  to  furnish  a 

proof  of  the  0/0  case  of  L ’Hospital’s  Rule  (Theorem  5.3.6). 

(b)  If  we  keep  the  first  part  of  the  hypothesis  of  Theorem  5.3.6  the  same  but 
we  assume  that 

r  fix) 

llm  —rr\  =  00  ’ 

x^a  9  \x) 

does  it  necessarily  follow  that 


f(x)  ? 

Inn  — —  =  oo: 
x—, >a  g[x) 

Exercise  5.3.12.  If  /  is  twice  differentiable  on  an  open  interval  containing  a 
and  f"  is  continuous  at  a,  show 


f(a  +  h) -2f(a)  +  f(a-h) 

lim - ^5 - 

h->  o  hz 


(Compare  this  to  Exercise  5.2.6(b).) 


5.4  A  Continuous  Nowhere- Differentiable 
Function 

Exploring  the  relationship  between  continuity  and  differentiability  has  led  to 
both  fruitful  results  and  pathological  counterexamples.  The  bulk  of  discussion 
to  this  point  has  focused  on  the  continuity  of  derivatives,  but  historically  a  sig¬ 
nificant  amount  of  debate  revolved  around  the  question  of  whether  continuous 
functions  were  necessarily  differentiable.  Early  in  the  chapter,  we  saw  that  con¬ 
tinuity  was  a  requirement  for  differentiability,  but,  as  the  absolute  value  function 
demonstrates,  the  converse  of  this  proposition  is  not  true.  A  function  can  be 
continuous  but  not  differentiable  at  some  point.  But  just  how  nondifferentiable 
can  a  continuous  function  be?  Given  a  finite  set  of  points,  it  is  not  difficult  to 
imagine  how  to  construct  a  graph  with  corners  at  each  of  these  points,  so  that 
the  corresponding  function  fails  to  be  differentiable  on  this  finite  set.  The  trick 
gets  more  difficult,  however,  when  the  set  becomes  infinite.  For  instance,  is  it 
possible  to  construct  a  function  that  is  continuous  on  all  of  R  but  fails  to  be 
differentiable  at  every  rational  point?  Not  only  is  this  possible,  but  the  situation 
is  even  more  disconcerting.  In  1872,  Karl  Weierstrass  presented  an  example  of 
a  continuous  function  that  was  not  differentiable  at  any  point.  (It  seems  to  be 


1A  large  class  of  “counterexamples”  of  this  sort  to  L’Hospital’s  Rule  are  explored  in  [4]. 
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Figure  5.6:  The  function  h(x). 


the  case  that  Bernhard  Bolzano  had  his  own  example  of  such  a  beast  as  early 
as  1830,  but  it  was  not  published  until  much  later.) 

Weierstrass  actually  discovered  a  class  of  nowhere-differentiable  functions  of 
the  form 

oo 

f(x )  =  an  cos (bnx) 

n= 0 


where  the  values  of  a  and  b  are  carefully  chosen.  Such  functions  are  specific 
examples  of  Fourier  series  discussed  in  Section  8.5.  The  details  of  Weierstrass’ 
argument  are  simplified  if  we  replace  the  cosine  function  with  a  piecewise  linear 
function  that  has  oscillations  qualitatively  like  cos(x). 

Define 


on  the  interval  [—1,1]  and  extend  the  definition  of  h  to  all  of  R  by  requiring 
that  h(x  +  2)  =  h{x).  The  result  is  a  periodic  “sawtooth”  function  (Fig.  5.6). 


Exercise  5.4.1.  Sketch  a  graph  of  (l/2)h(2x)  on 
description  of  the  functions 


Give  a  qualitative 


h(  2nx) 


as  n  gets  larger. 

Now,  define 

OO  OO  1 

g(x)  =  T  M^)  =  T 

n— 0  n— 0 

The  claim  is  that  g(x)  is  continuous  on  all  of  R  but  fails  to  be  differentiable  at 
any  point. 

Infinite  Series  of  Functions  and  Continuity 

The  definition  of  g(x)  is  a  significant  departure  from  the  way  we  usually  define 
functions.  For  each  x  e  R,  g(x)  is  defined  to  be  the  value  of  an  infinite  series. 


164 


Chapter  5.  The  Derivative 


Figure  5.  7:  A  sketch  OF  g(x)  =  (l/2n)/i(2nx). 


Exercise  5.4.2.  Fix  x  G  R.  Argue  that  the  series 

OO  -j 

E  wh{2"x> 

n— 0 

converges  and  thus  g(pc)  is  properly  defined. 

Exercise  5.4.3.  Taking  the  continuity  of  h(x)  as  given,  reference  the  proper 
theorems  from  Chapter  4  that  imply  that  the  finite  sum 

m  1 

9m(x)  = 

n= 0 


is  continuous  on  R. 

This  brings  us  to  an  archetypical  question  in  analysis:  When  do  conclusions 
that  are  valid  in  finite  settings  extend  to  infinite  ones?  A  finite  sum  of  continuous 
functions  is  certainly  continuous,  but  does  this  necessarily  hold  for  an  infinite 
sum  of  continuous  functions?  In  general,  we  will  see  that  this  is  not  always  the 
case.  For  this  particular  sum,  however,  the  continuity  of  the  limit  function  g(x) 
can  be  proved.  Deciphering  when  results  about  finite  sums  of  functions  extend 
to  infinite  sums  is  one  of  the  fundamental  themes  of  Chapter  6.  Although  a 
self-contained  argument  for  the  continuity  of  g  is  not  beyond  our  means  at  this 
point,  we  will  nevertheless  postpone  the  proof  (see,  for  example,  Exercise  6.4.3), 
leaving  it  as  an  enticement  for  the  upcoming  study  of  uniform  convergence. 

Exercise  5.4.4.  As  the  graph  in  Figure  5.7  suggests,  the  structure  of  g(x)  is 
quite  intricate.  Answer  the  following  questions,  assuming  that  g(x)  is  indeed 
continuous. 

(a)  How  do  we  know  g  attains  a  maximum  value  M  on  [0,2]?  What  is  this 
value? 
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(b)  Let  D  be  the  set  of  points  in  [0,  2]  where  g  attains  its  maximum.  That  is 
D  =  {x  G  [0,  2]  :  g(x)  =  M}.  Find  one  point  in  D. 

(c)  Is  D  finite,  countable,  or  uncountable? 

Nondifferent  iability 

When  the  proper  tools  are  in  place,  the  proof  that  g  is  continuous  is  quite 
straightforward.  The  more  difficult  task  is  to  show  that  g  is  not  differentiable 
at  any  point  in  R. 

Let’s  first  look  at  the  point  x  =  0.  Our  function  g  does  not  appear  to 
be  differentiable  here,  and  a  rigorous  proof  is  not  too  difficult.  Consider  the 
sequence  xm  =  l/2m,  where  m  —  0, 1,  2, ... . 

Exercise  5.4.5.  Show  that 


g(xm)  -  g{ o) 


Xm 


m  +  1, 


and  use  this  to  prove  that  g' ( 0)  does  not  exist. 

Any  temptation  to  say  something  like  g'( 0)  =  oo  should  be  resisted.  Setting 
Xm  =  — (l/2m)  in  the  previous  argument  produces  difference  quotients  heading 
toward  —  oo.  The  geometric  manifestation  of  this  is  the  “cusp”  that  appears  at 
x  =  0  in  the  graph  of  g. 


Exercise  5.4.6.  (a)  Modify  the  previous  argument  to  show  that  g'(  1)  does 

not  exist.  Show  that  g'(  1/2)  does  not  exist. 


(b)  Show  that  g'(x)  does  not  exist  for  any  rational  number  of  the  form  x  = 
p/2k  where  p  E  Z  and  k  G  N  U  {0}. 

The  points  described  in  Exercise  5.4.6  (b)  are  called  dyadic  points.  If  x  = 
p/2k  is  a  dyadic  rational  number,  then  the  function  hn  has  a  corner  at  x  as  long 
as  n  >  k.  Thus,  it  should  not  be  too  surprising  that  g  fails  to  be  differentiable 
at  points  of  this  form.  The  argument  is  more  delicate  at  points  between  the 
dyadic  points. 

Assume  x  is  not  a  dyadic  number.  For  a  fixed  value  of  m  G  N  U  {0},  x  falls 
between  two  adjacent  dyadic  points, 


Pm 

2  m 


<  X  < 


Pm  T  1 
2  m 


Set  xm  =  pm /2m  and  ym  =  (pm  +  l)/2m.  Repeating  this  for  each  m  yields  two 
sequences  (xm)  and  (ym)  satisfying 


m  — 


limx 


lim ym  =  x  and  xm  <  x  <  ym. 
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Exercise  5.4.7.  (a)  First  prove  the  following  general  lemma:  Let  /  be  defined 
on  an  open  interval  J  and  assume  /  is  differentiable  at  a  E  J .  If  (an)  and  (bn) 
are  sequences  satisfying  an  <  a  <  bn  and  lim  an  =  lim  bn  =  a,  show 


lim 

n—Y  oo 


f(K)  ~  f(an) 


(b)  Now  use  this  lemma  to  show  that  g'(x)  does  not  exist. 

Weierstrass’s  original  1872  paper  contained  a  demonstration  that  the  infinite 
sum 

oo 

f(x )  =  an  cos (bnx) 

n= 0 

defined  a  continuous  nowhere-differentiable  function  provided  0  <  a  <  1  and 
b  was  an  odd  integer  satisfying  ab  >  1  +  3tt / 2.  The  condition  on  a  is  easy  to 
understand.  If  0  <  a  <  1,  then  a  convergent  geometric  series,  and 

the  forthcoming  Weierstrass  M-Test  (Theorem  6.4.5)  can  be  used  to  conclude 
that  /  is  continuous.  The  restriction  on  b  is  more  mysterious.  In  1916,  G.H. 
Hardy  extended  Weierstrass’  result  to  include  any  value  of  b  for  which  ab  >  1. 
Without  looking  at  the  details  of  either  of  these  arguments,  we  nevertheless  get 
a  sense  that  the  lack  of  a  derivative  is  intricately  tied  to  the  relationship  between 
the  compression  factor  (the  parameter  a)  and  the  rate  at  which  the  frequency 
of  the  oscillations  increases  (the  parameter  b). 

Exercise  5.4.8.  Review  the  argument  for  the  nondifferentiability  of  g(x)  at 
nondyadic  points.  Does  the  argument  still  work  if  we  replace  g{x)  with  the 
summation  l/2n)h(3nx)?  Does  the  argument  work  for  the  function 

E"=o(V3”)^(2n^)? 


5.5  Epilogue 

Far  from  being  an  anomaly  to  be  relegated  to  the  margins  of  our  understanding 
of  continuous  functions,  Weierstrass’s  example  and  those  like  it  should  actually 
serve  as  a  guide  to  our  intuition.  The  image  of  continuity  as  a  smooth  curve 
in  our  mind’s  eye  severely  misrepresents  the  situation  and  is  the  result  of  a 
bias  stemming  from  an  overexposure  to  the  much  smaller  class  of  differentiable 
functions.  The  lesson  here  is  that  continuity  is  a  strictly  weaker  notion  than 
differentiability.  In  Section  3.6,  we  alluded  to  a  corollary  of  the  Baire  Category 
Theorem,  which  asserts  that  Weierstrass’s  construction  is  actually  typical  of 
continuous  functions.  We  will  see  that  most  continuous  functions  are  nowhere- 
differentiable,  so  that  it  is  really  the  differentiable  functions  that  are  the  excep¬ 
tions  rather  than  the  rule.  The  details  of  how  to  phrase  this  observation  more 
rigorously  are  spelled  out  in  Section  8.2. 

To  say  that  the  nowhere-differentiable  function  g  constructed  in  the  previous 
section  has  “corners”  at  every  point  of  its  domain  misses  the  mark.  Weierstrass’s 
original  class  of  nowhere-differentiable  functions  was  constructed  from  infinite 
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sums  of  smooth  trigonometric  functions.  It  is  the  densely  nested  oscillating 
structure  that  makes  the  definition  of  a  tangent  line  impossible.  So  what  hap¬ 
pens  when  we  restrict  our  attention  to  monotone  functions?  How  nondifferen- 
tiable  can  an  increasing  function  be?  Given  a  finite  set  of  points,  it  is  not  difficult 
to  piece  together  a  monotone  function  which  has  actual  corners — and  thus  is 
not  differentiable — at  each  point  in  the  given  set.  A  natural  question  is  whether 
there  exists  a  continuous,  monotone  function  that  is  nowhere-differentiable. 
Weierstrass  suspected  that  such  a  function  existed  but  only  managed  to  produce 
an  example  of  a  continuous,  increasing  function  which  failed  to  be  differentiable 
on  a  countable  dense  set  (Exercise  7.5.11).  In  1903,  the  French  mathemati¬ 
cian  Henri  Lebesgue  (1875-1941)  demonstrated  that  Weierstrass’s  intuition  had 
failed  on  this  account.  Lebesgue  proved  that  a  continuous,  monotone  function 
would  have  to  be  differentiable  at  “almost”  every  point  in  its  domain.  To  be 
specific,  Lebesgue  showed  that,  for  every  e  >  o,  the  set  of  points  where  such  a 
function  fails  to  be  differentiable  can  be  covered  by  a  countable  union  of  inter¬ 
vals  whose  lengths  sum  to  less  than  e.  This  notion  of  “zero  length,”  or  “measure 
zero”  as  it  is  called,  was  encountered  in  our  discussion  of  the  Cantor  set  and  is 
explored  more  fully  in  Section  7.6,  where  Lebesgue’s  substantial  contribution  to 
the  theory  of  integration  is  discussed. 

With  the  relationship  between  the  continuity  of  /  and  the  existence  of  f' 
somewhat  in  hand,  we  once  more  return  to  the  question  of  characterizing  the  set 
of  all  derivatives.  Not  every  function  is  a  derivative.  Darboux’s  Theorem  forces 
us  to  conclude  that  there  are  some  functions — those  with  jump  discontinuities 
in  particular — that  cannot  appear  as  the  derivative  of  some  other  function. 
Another  way  to  phrase  Darboux’s  Theorem  is  to  say  that  all  derivatives  must 
satisfy  the  intermediate  value  property.  Continuous  functions  do  possess  the 
intermediate  value  property,  and  it  is  natural  to  ask  whether  every  continuous 
function  is  necessarily  a  derivative.  For  this  smaller  class  of  functions,  the 
answer  is  yes.  The  Fundamental  Theorem  of  Calculus,  treated  in  Chapter  7, 
states  that,  given  a  continuous  function  /,  the  function  F{x)  =  ff  f  satisfies 
F'  =  /.  This  does  the  trick.  The  collection  of  derivatives  at  least  contains  the 
continuous  functions.  The  search  for  a  concise  characterization  of  all  possible 
derivatives,  however,  remains  largely  unsuccessful. 

As  a  final  remark,  we  will  see  that  by  cleverly  choosing  /,  this  technique 
of  defining  F  via  F(x)  =  ff  f  can  be  used  to  produce  examples  of  continuous 
functions  which  fail  to  be  differentiable  on  interesting  sets,  provided  we  can  show 
that  ff  f  is  defined.  The  question  of  just  how  to  define  integration  became  a 
central  theme  in  analysis  in  the  latter  half  of  the  19th  century  and  has  continued 
on  to  the  present.  Much  of  this  story  is  discussed  in  detail  in  Chapter  7  and 
Section  8.1. 


Chapter  6 


Sequences  and  Series 
of  Functions 

6.1  Discussion:  The  Power  of  Power  Series 

In  1689,  Jakob  Bernoulli  published  his  Tractatus  de  seriebus  infinitis  summa¬ 
rizing  what  was  known  about  infinite  series  toward  the  end  of  the  17th  century. 
Full  of  clever  calculations  and  conclusions,  this  publication  was  also  notable  for 
one  particular  question  that  it  didn’t  answer;  namely,  what  is  the  precise  value 
of  the  series 

1  1  1 

1  _l_  —  _|_  —  _|_  —  -j-  •  •  •  . 

4  9  16 

Bernoulli  convincingly  argued  that  ^  1/n2  converged  to  something  less  than 
2  (see  Example  2.4.4)  but  he  was  unable  to  find  an  explicit  expression  for 
the  limit.  Generally  speaking,  it  is  much  harder  to  sum  a  series  than  it  is  to 
determine  whether  or  not  it  converges.  In  fact,  being  able  to  find  the  sum  of  a 
convergent  series  is  the  exception  rather  than  the  rule.  In  this  case,  however,  the 
series  ^  1/n2  seemed  so  elementary;  more  elementary  than,  say,  n<2 /2n  or 

l/n(n  +  1)7  both  of  which  Bernoulli  was  able  to  handle.  “If  anyone  finds 
and  communicates  to  us  that  which  has  so  far  eluded  our  efforts,”  Bernoulli 
wrote,  “great  will  be  our  gratitude.”  1 

Geometric  series  are  the  most  prominent  class  of  examples  that  can  be  readily 
summed.  In  Example  2.7.5  we  proved  that 

l 

(1)  - =  l-\-x  +  x2+x3  +  --  - 

1  —  X 


00 

e4 

n2 

n= 1 


As  quoted  in  [12],  which  contains  a  much  more  thorough  account  of  this  story. 
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for  all  \x\  <  1.  Thus,  for  example,  V^n  =  2  and  X!^Lo(— l/3)n  =  3/4. 

Geometric  series  were  part  of  mathematical  folklore  long  before  Bernoulli;  how¬ 
ever,  what  was  relatively  novel  in  Bernoulli’s  time  was  the  idea  of  operating  on 
infinite  series  such  as  (1)  with  tools  from  the  budding  theory  of  calculus.  For 
instance,  what  happens  if  we  take  the  derivative  on  each  side  of  equation  (1)? 
The  left  side  is  easy  enough — we  just  get  1/(1  —  x)2 .  But  what  about  the  right 
side?  Adopting  a  17th  century  mindset,  a  natural  way  to  proceed  is  to  treat  the 
infinite  series  as  a  polynomial,  albeit  of  infinite  degree.  Differentiation  across 
equation  (1)  in  this  fashion  gives 

(2)  — - —  =  0  T-  1  -f-  2 x  T  3x2  H-  Ax ^  +  •  •  •  . 

(1  —  x)z 

Is  this  a  valid  formula,  at  least  for  values  of  x  in  (—1,1)?  Empirical  evidence 
suggests  it  is.  Setting  x  =  1/2  we  get 

.  ^  n  1  1  3  4  5 

4  —  y  - —  —  1  +  1  +  —  4-  —  -j-  —  , 

2n“1  4  8  16 

n= 1 


which  feels  plausible,  and  is  in  fact  true.  Although  not  Bernoulli’s  requested 
series,  this  does  suggest  a  possible  new  line  of  attack. 

Manipulations  of  this  sort  can  be  used  to  create  a  wide  assortment  of  new 
series  representations  for  familiar  functions.  Substituting  —  x2  for  x  in  (1)  gives 

l 

(3)  - -  =  1  —  x2  +  x4  —  x6  +  x8  —  •  •  •  , 

w  1  +  x2 


for  all  x  G  (—1,1). 

Once  again  closing  our  eyes  to  the  potential  danger  of  treating  an  infinite 
series  as  though  it  were  a  polynomial,  let’s  see  what  happens  when  we  take 
antiderivatives.  Using  the  fact  that 


(arctan(x))/  = 


1 


1  +  or 


and  arctan(O)  =  0, 


equation  (3)  becomes 


(4) 


Q  ft 

arctan(x)  =  x - 1 - 


x 


7 


7 


+ 


Plugging  x 


1  into  equation  (4)  yields  the  striking  relationship 


7 r  1111 

—  —  —  —  —  -j-  —  —  •••  . 

4  3  5  7  9 

The  constant  tt,  which  arises  from  the  geometry  of  circles,  has  somehow  found  its 
way  into  an  equation  involving  the  reciprocals  of  the  odd  integers.  Is  this  a  valid 
formula?  Can  we  really  treat  the  infinite  series  in  (3)  like  a  finite  polynomial? 
Even  if  the  answer  is  yes  there  is  still  another  mystery  to  solve  in  this  example. 
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Plugging  x  =  1  into  equations  (1),  (2),  or  (3)  yields  mathematical  gibberish,  so 
is  it  prudent  to  anticipate  something  meaningful  arising  from  equation  (4)  at 
this  same  value?  Will  any  of  these  ideas  get  us  closer  to  computing  l/n2? 

As  it  turned  out,  Bernoulli’s  plea  for  help  was  answered  in  an  unexpected  way 
by  Leonard  Euler.  At  a  young  age,  Euler  was  a  student  of  Jakob  Bernoulli’s 
brother  Johann,  and  the  stellar  pupil  quickly  rose  to  become  the  preeminent 
mathematician  of  his  age.  Euler’s  solution  is  impossible  to  anticipate.  In  1735, 
he  announced  that 


1  1  1 

1+4  +  9  +  16  + 


7T 


6  ’ 


a  provocative  formula  that,  even  more  than  equation  (5),  hints  at  deep  con¬ 
nections  between  geometry,  number  theory  and  analysis.  Euler’s  argument  is 
quite  short,  but  it  needs  to  be  viewed  in  the  context  of  the  time  in  which  it  was 
created.  The  “infinite  polynomials”  in  this  discussion  are  examples  of  power 
series ,  and  a  major  catalyst  for  the  expanding  power  of  calculus  in  the  17th  and 
18th  centuries  was  a  proliferation  of  techniques  like  the  ones  used  to  generate 
formulas  (2),  (3),  and  (4).  The  machinations  of  both  algebra  and  calculus  are 
relatively  straightforward  when  restricted  to  the  class  of  polynomials.  So,  if 
in  fact  power  series  could  be  treated  more  or  less  like  unending  polynomials, 
then  there  was  a  great  incentive  to  try  to  find  power  series  representations  for 
familiar  functions  like  e2’,  \/ 1  +  x,  or  sin(x). 

The  appearance  of  arctan(x)  in  (4)  is  an  encouraging  sign  that  this  might 
indeed  always  be  possible.  One  of  Isaac  Newton’s  more  significant  achievements 
was  to  produce  a  generalization  of  the  binomial  formula.  If  n  E  N,  then  old- 
fashioned  finite  algebra  leads  to  the  formula 


,  N77  ,  n(n  —  1)  o  n(n  —  l)(n  —  2)  o  r) 

(1  -I-  x)n  =  1  +  nx  +  — -x2  +  — - - -x3  H - \-xn 


2! 


3! 


Through  a  process  of  experimentation  and  intuition  Newton  realized  that  for 
r  ^  N,  the  infinite  series 


,  ,r  r(r  —  1)  2  r(r  —  l)(r  —  2)  3 

(1  +  x)  =  1  +  rx  H - — — -x2  H - — - -x6  + 


2! 


3! 


was  meaningful,  at  least  for  x  E  (—1,1).  Setting  r  =  —  1,  for  example,  yields 


1 


1  +  x 


—  1  —  X  +  x2  —  x6  +  xl 


which  is  easily  seen  to  be  equivalent  to  equation  (1).  Setting  r  =  1/2  we  get 


/- -  1  1  2  33  3-54 

v  1  x  —  1  -|“  — x  —  — - — ~x  -f-  — - — ~x  —  — - — ~x  T 

2  222!  233!  244! 
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One  way  to  lend  a  little  credence  to  this  formula  for  y/l  ±  x  is  to  focus  on  the 
first  few  terms  and  square  the  series: 

(a/1  +  x )2  =  (l  +  C-  b2  H - i  f  1  +  b  -  b2  H - 

,  (\  1\  (  1  1  1\  2 

=  1+  — I —  x  +  ( - 1 - ]  x2  +  •  •  • 

V2  2 )  V  8  4  8/ 

=  1  ±  x  ±  Ox  -I-  Ox  ±  •  •  •  . 


Amid  all  of  the  unfounded  assumptions  we  are  making  about  infinity,  calcula¬ 
tions  like  this  induce  a  feeling  of  optimism  about  the  legitimacy  of  our  search 
for  power  series  representations. 

Newton’s  binomial  series  is  the  starting  point  for  a  modern  proof  of  Euler’s 
famous  sum,  which  is  sketched  out  in  detail  in  Section  8.3.  Euler’s  original 
1735  argument,  however,  started  from  the  power  series  representation  for  sin(x). 
The  formula 


smx 


x 


Q  ^  7 

rp  ^  rp^  rp  1 

T  T 

+  T7--  + 


3!  5! 


7! 


was  known  to  Newton,  Bernoulli,  and  Euler  alike.  In  contrast  to  equation  (1), 
we  will  see  that  this  formula  is  valid  for  all  x  G  R.  Factoring  out  x  and  dividing 
yields  a  power  series  with  leading  coefficient  equal  to  1: 


smx 


x 


=  1 


X' 


+ 


X 


X 


6 


3!  5! 


7! 


+ 


Euler’s  idea  was  to  continue  factoring  the  power  series  in  (6),  and  his  strategy 
for  doing  this  was  very  much  in  keeping  with  what  we  have  seen  so  far — treat 
the  power  series  as  though  it  were  a  polynomial  and  then  extend  the  pattern  to 
infinity. 

Factoring  a  polynomial  of,  say,  degree  three  is  straightforward  if  we  know 
its  roots.  If  p(x)  =  1  +  ax  +  bx2  +  cx3  has  roots  ri,  7*2,  and  7*3,  then 


To  see  this  just  directly  substitute  to  get  p( 0)  =  1  andp(ri)  =  p(r 2)  =  p(rs)  =  0. 

The  roots  of  the  power  series  in  (6)  are  the  nonzero  roots  of  sinx,  or  x  = 
dz7r,  ±27t,  ±37 r,  and  so  on.  All  right  then — relying  on  his  fabled  intuition,  Euler 
surmised  that 
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where  in  the  last  step  adjacent  pairs  of  factors  have  been  multiplied  together. 
What  happens  if  we  continue  to  multiply  out  the  factors  on  the  right?  Well, 
the  constant  term  comes  out  to  be  1  which  happily  matches  the  constant  term 
on  the  left.  The  magic  comes  when  we  compare  the  x2  term  on  each  side 
of  (7).  Multiplying  out  the  infinite  number  of  factors  on  the  right  (using  our 
imagination  as  necessary)  and  collecting  like  powers  of  x,  equation  (7)  becomes 


1 


1 


1 


47 r2  97 r2 


x 2  + 


1 


+ 


1 


47 r4  97 r4 


+ 


x 4  + 


Equating  the  coefficients  of  x2  on  each  side  yields 

1  _  1  1  1 

3!  7 r2  47t2  97 r2 

which  when  we  multiply  by  —if2  becomes 


7i  11  1 

—  —  1  "T  —  +  —  +  — 
6  4  916 


Numerical  approximations  of  each  side  of  this  equation  confirmed  for  Euler 
that,  despite  the  audacious  leaps  in  his  argument,  he  had  landed  on  solid  ground. 
By  our  standards,  this  derivation  falls  well  short  of  being  a  proper  proof,  and 
we  will  have  to  tend  to  this  in  the  upcoming  chapters.  The  takeaway  of  this 
discussion  is  that  the  hard  work  ahead  is  worth  the  effort.  Infinite  series  repre¬ 
sentations  of  functions  are  both  useful  and  surprisingly  elegant,  and  can  lead  to 
remarkable  conclusions  when  they  are  properly  handled. 

The  evidence  so  far  suggests  power  series  are  quite  robust  when  treated  as 
if  they  were  finite  in  nature.  Term-by-term  differentiation  produced  a  valid 
conclusion  in  equation  (2),  and  taking  antiderivatives  fared  similarly  well  in 
(4).  We  will  see  that  these  manipulations  are  not  always  justified  for  infinite 
series  of  more  general  types  of  functions.  What  is  it  about  power  series  in 
particular  that  makes  them  so  impervious  to  the  dangers  of  the  infinite?  Of 
the  many  unanswered  questions  in  this  discussion,  this  last  one  is  probably  the 
most  central,  and  the  most  important  to  understanding  series  of  functions  in 
general. 


6.2  Uniform  Convergence  of  a  Sequence 
of  Functions 

Adopting  the  same  strategy  we  used  in  Chapter  2,  we  will  initially  concern 
ourselves  with  the  behavior  and  properties  of  converging  sequences  of  func¬ 
tions.  Because  convergence  of  infinite  series  is  defined  in  terms  of  the  associated 
sequence  of  partial  sums,  the  results  from  our  study  of  sequences  will  be  imme¬ 
diately  applicable  to  the  questions  we  have  raised  about  both  power  series  and 
more  general  infinite  series  of  functions. 
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Figure  6.1:  /i,  f5,  f10,  AND  f20  WHERE  /„  =  ( x 2  +nx)/n. 

Pointwise  Convergence 

Definition  6.2.1.  For  each  n  E  N,  let  fn  be  a  function  defined  on  a  set  4CR. 
The  sequence  (/n)  of  functions  converges  pointwise  on  A  to  a  function  /  if,  for 
all  x  £  A,  the  sequence  of  real  numbers  /n(x)  converges  to  f(x). 

In  this  case,  we  write  fn  /,  lim  fn  =  /,  or  hmn^oo/n(x)  =  f(x).  This 
last  expression  is  helpful  if  there  is  any  confusion  as  to  whether  x  or  n  is  the 
limiting  variable. 


Example  6.2.2.  (i)  Consider 

fn(x)  =  (x2  +nx)/n 


on  all  of  R.  Graphs  of  f\ ,  /s ,  /io,  and  /20  (Fig.  6.1)  give  an  indication  of 
what  is  happening  as  n  gets  larger.  Algebraically,  we  can  compute 


lim  fn(x) 

n— >00 


.  x2  +  nx 

lim  - 

n— >00  n 


lim 

n— 00 


- b  X  =  X. 

n 


Thus,  (/n)  converges  pointwise  to  f[x)  —  x  on  R. 


(ii)  Let  gn {%)  =  xn  on  the  set  [0,1],  and  consider  what  happens  as  n  tends  to 
infinity  (Fig.  6.2).  If  0  <  x  <  1,  then  we  have  seen  that  xn  0.  On  the 
other  hand,  if  x  =  1,  then  xn  1.  It  follows  that  gn  g  pointwise  on 
[0, 1],  where 


f  0  for  0  <  x  <  1 
[  1  for  x  =  1. 


(iii)  Consider  hn(x)  =  x1+  2n-1  on  the  set  [—1,1]  (Fig.  6.3).  For  a  fixed  x  £ 
[—1,1]  we  have 

lim  hn(x)  =  x  lim  x 2n~1  =  \x  . 

n^oo  n^oo 
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Figure  6.2:  g(pc)  =  limn^oo  xn  IS  NOT  CONTINUOUS  ON  [0, 1]. 


Figure  6.3: 


h 


n 


X 


ON 


—  1,1];  LIMIT  IS  NOT  DIFFERENTIABLE. 


Examples  6.2.2  (ii)  and  (iii)  are  our  first  indication  that  there  is  some  difficult 
work  ahead  of  us.  The  central  theme  of  this  chapter  is  analyzing  which  prop¬ 
erties  the  limit  function  inherits  from  the  approximating  sequence.  In  Example 
6.2.2  (iii)  we  have  a  sequence  of  differentiable  functions  converging  pointwise  to 
a  limit  that  is  not  differentiable  at  the  origin.  In  Example  6.2.2  (ii),  we  see  an 
even  more  fundamental  problem  of  a  sequence  of  continuous  functions  converg¬ 
ing  to  a  limit  that  is  not  continuous. 

Continuity  of  the  Limit  Function 

With  Example  6.2.2  (ii)  firmly  in  mind,  we  begin  this  discussion  with  a  doomed 
attempt  to  prove  that  the  pointwise  limit  of  continuous  functions  is  continuous. 
Upon  discovering  the  problem  in  the  argument,  we  will  be  in  a  better  position 
to  understand  the  need  for  a  stronger  notion  of  convergence  for  sequences  of 
functions. 
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Assume  (/n)  is  a  sequence  of  continuous  functions  on  a  set  A  C  R,  and 
assume  (fn)  converges  pointwise  to  a  limit  /.  To  argue  that  /  is  continuous,  fix 
a  point  c  E  A,  and  let  e  >  0.  We  need  to  find  a  S  >  0  such  that 


x 


c 


<  S  implies  |  f(x)  —  /(c) |  <  e. 


By  the  triangle  inequality, 


f(x)-f(c)\  =  \f(x)  -  fn(x)  +  fn(x)  -  fn(c)  + fn(c)  -  f(c)\ 

<  |  fix)  ~  fn(x)  |  +  |  fn(x)  ~  fn(c)  \  +  \f„(c)  -  f(c) 


Our  first,  optimistic  impression  is  that  each  term  in  the  sum  on  the  right-hand 
side  can  be  made  small — the  first  and  third  by  the  fact  that  fn  /,  and  the 
middle  term  by  the  continuity  of  fn.  In  order  to  use  the  continuity  of  /n,  we 
must  first  establish  which  particular  fn  we  are  talking  about.  Because  c  E  A  is 
fixed,  choose  TV  £  N  so  that 


■Me)  -  /(c) 


Now  that  N  is  chosen,  the  continuity  of  Jn  implies  that  there  exists  a  5  >  0 
such  that 

\Jn(x)  -  fN(c)\  <  - 


for  all  x  satisfying  \x  —  c\  <  S. 

But  here  is  the  problem.  We  also  need 


|/at(x)  —  f(x)  |  <  -  for  all  x  satisfying  \x  —  c\  <  S. 

3 


The  values  of  x  depend  on  S ,  which  depends  on  the  choice  of  N.  Thus,  we  cannot 
go  back  and  simply  choose  a  different  N.  More  to  the  point,  the  variable  x  is 
not  fixed  the  way  c  is  in  this  discussion  but  represents  any  point  in  the  interval 
(c— S,  c+S).  Pointwise  convergence  implies  that  we  can  make  \  fn(x)  —  f(x)\  <  e/3 
for  large  enough  values  of  n,  but  the  value  of  n  depends  on  the  point  x.  It  is 
possible  that  different  values  for  x  will  result  in  the  need  for  different — larger — 
choices  for  n.  This  phenomenon  is  apparent  in  Example  6.2.2  (ii).  To  achieve 
the  inequality 

Ml/2)-5(l/2)|  <  1 

we  need  n  >  2,  whereas 

M9/10)-  5(9/10)1  <  t 


is  true  only  after  n  >  11. 


Uniform  Convergence 

To  resolve  this  dilemma,  we  define  a  new,  stronger  notion  of  convergence  of 
functions. 
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Definition  6.2.3  (Uniform  Convergence).  Let  (/n)  be  a  sequence  of  func¬ 
tions  defined  on  a  set  A  C  R.  Then,  (/n)  converges  uniformly  on  A  to  a  limit 
function  /  defined  on  A  if,  for  every  e  >  0,  there  exists  an  TV  £  N  such  that 
I  fn  (x)  —  f(x)  |  <  e  whenever  n  >  TV  and  x  £  A. 


To  emphasize  the  difference  between  uniform  convergence  and  pointwise  con¬ 
vergence,  we  restate  Definition  6.2.1,  being  more  explicit  about  the  relationship 
between  e,  TV,  and  x.  In  particular,  notice  where  the  domain  point  x  is  refer¬ 
enced  in  each  definition  and  consequently  how  the  choice  of  TV  then  does  or  does 
not  depend  on  this  value. 


Definition  6. 2. IB.  Let  (/n)  be  a  sequence  of  functions  defined  on  a  set  ICR, 
Then,  (/n)  converges  pointwise  on  A  to  a  limit  /  defined  on  A  if,  for  every 
e  >  0  and  x  £  A,  there  exists  an  TV  £  N  (perhaps  dependent  on  x)  such  that 
|  fn  (x)  —  f(x)  |  <  e  whenever  n  >  TV. 


The  use  of  the  adverb  uniformly  here  should  be  reminiscent  of  its  use  in 
the  phrase  “uniformly  continuous”  from  Chapter  4.  In  both  cases,  the  term 
“uniformly”  is  employed  to  express  the  fact  that  the  response  ((5  or  TV)  to  a 
prescribed  e  can  be  chosen  to  work  simultaneously  for  all  values  of  x  in  the 
relevant  domain. 


Example  6.2.4.  (i)  Let 

9n(x)  =  —— - — . 

n(l  +  xz) 

For  any  fixed  x  £  R,  we  can  see  that  lim  gn(x)  =  0  so  that  g(x)  =  0  is  the 
pointwise  limit  of  the  sequence  (gn)  on  R.  Is  this  convergence  uniform? 
The  observation  that  1/(1  +  x2)  <  1  for  all  x  £  R  implies  that 


9n(x)  ~g(x) 


1 


n(  1  +  x2) 


0 


1 

<  - 

n 


Thus,  given  e  >  0,  we  can  choose  TV  >  1/e  (which  does  not  depend  on  x), 
and  it  follows  that 


n  >  TV  implies 


9n(x)  -  g(x) |  <  e 


for  all  x  £  R.  By  Definition  6.2.3,  gn  -£  0  uniformly  on  R. 


(ii)  Look  back  at  Example  6.2.2  (i),  where  we  saw  that  fn(x)  =  {x2  +  nx)/n 
converges  pointwise  on  R  to  f(x)  =  x.  On  R,  the  convergence  is  not 
uniform.  To  see  this  write 


fn(x)  ~  f(x) 


x 2  +  nx 

- X 

n 


n 


and  notice  that  in  order  to  force  |  fn(x)  —  f(x) 
to  choose 


TV  > 


<  e,  we  are  going  to  have 


e 
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Although  this  is  possible  to  do  for  each  xgR,  there  is  no  way  to  choose 
a  single  value  of  N  that  will  work  for  all  values  of  x  at  the  same  time. 


On  the  other  hand,  we  can  show  that  fn  f  uniformly  on  the  set  [—6,  b\. 
By  restricting  our  attention  to  a  bounded  interval,  we  may  now  assert  that 


x 2  b 2 

—  <  — . 

n  n 

Given  e  >  0,  then,  we  can  choose 


N  > 


e 


independently  of  x  E 


Graphically  speaking,  the  uniform  convergence  of  fn  to  a  limit  /  on  a  set 
A  can  be  visualized  by  constructing  a  band  of  radius  =be  around  the  limit  func¬ 
tion  /.  If  fn  — >  f  uniformly,  then  there  exists  a  point  in  the  sequence  after  which 
each  fn  is  completely  contained  in  this  e-strip  (Fig.  6.4).  This  image  should  be 
compared  with  the  graphs  in  Figures  6. 1-6.2  from  Example  6.2.2  and  the  one 
in  Figure  6.5. 


Cauchy  Criterion 

Recall  that  the  Cauchy  Criterion  for  convergent  sequences  of  real  numbers  was 
an  equivalent  characterization  of  convergence  which,  unlike  the  definition,  did 
not  make  explicit  mention  of  the  limit.  The  usefulness  of  the  Cauchy  Criterion 
suggests  the  need  for  an  analogous  characterization  of  uniformly  convergent 
sequences  of  functions.  As  with  all  statements  about  uniformity,  pay  special 
attention  to  the  relationship  between  the  response  variable  (N  E  N)  and  the 
domain  variable  {x  E  A). 


Theorem  6.2.5  (Cauchy  Criterion  for  Uniform  Convergence).  A  se¬ 
quence  of  functions  (/n)  defined  on  a  set  A  C  R  converges  uniformly  on  A  if 
and  only  if  for  every  e  >  0  there  exists  an  N  E  N  such  that  \  fn(x)  —  fm(%)  I  <  6 
whenever  m,  n  >  N  and  x  E  A. 


Proof.  Exercise  6.2.5. 


□ 


Continuity  Revisited 

The  stronger  assumption  of  uniform  convergence  is  precisely  what  is  required  to 
remove  the  flaws  from  our  attempted  proof  that  the  limit  of  continuous  functions 
is  continuous. 

Theorem  6.2.6  (Continuous  Limit  Theorem).  Let  (/n)  be  a  sequence  of 
functions  defined  on  A  C  R  that  converges  uniformly  on  A  to  a  function  f .  If 
each  fn  is  continuous  at  c  E  A,  then  f  is  continuous  at  c. 
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Figure  6.5:  gn  — »  g  pointwise,  but  not  uniformly. 


Proof.  Fix  c  E  A  and  let  e  >  0.  Choose  N  so  that 

\In(x)  -  f(x)  I  <  | 

for  all  x  G  A.  Because  /at  is  continuous,  there  exists  a  S  >  0  for  which 


I  fN(x)-fN(c)  < 


e 

3 


is  true  whenever 


x  —  c 


<  5.  But  this  implies 


f(x)  -  /(c) 


=  I  f(x)  -  f n(x )  +  /n(x)  -  /jv(c)  +  /jv(c)  -  /(c)  I 

<  |/(ar)  -  fN(x) I  +  |/at(x)  -  /jv(c)|  +  |/jv(c)  -  /(c) 

e  e  e 


Thus,  /  is  continuous  at  cG  A 


□ 
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Exercises 


Exercise  6.2.1.  Let 


fn(x) 


nx 


1  +  nx 2 


(a)  Find  the  pointwise  limit  of  (/n)  for  all  x  G  (0,  oo). 

(b)  Is  the  convergence  uniform  on  (0,  oo)? 

(c)  Is  the  convergence  uniform  on  (0, 1)? 

(d)  Is  the  convergence  uniform  on  (1,  oo)? 

Exercise  6.2.2.  (a)  Define  a  sequence  of  functions  on  R  by 


fn(x) 


1  if  T  =  1  —  —  — 

-L  LL  ^  5  2  5  3  5  *  *  *  5  n. 

0  otherwise 


n 


n 


and  let  /  be  the  pointwise  limit  of  / 

Is  each  fn  continuous  at  zero?  Does  / 
continuous  at  zero? 


/  uniformly  on  R?  Is  / 


(b)  Repeat  this  exercise  using  the  sequence  of  functions 


•  r-  ill  1 

x  \i  x  =  1,  L  L  .  . . ,  - 

'2*0*  '  n. 

0  otherwise. 


0n(z) 


n 


(c)  Repeat  the  exercise  once  more  with  the  sequence 


hn{x)  = 


if  x  =  - 

n  i  i 

if  r  =  1  —  — 


‘  ’  n  — 1 


otherwise. 


In  each  case,  explain  how  the  results  are  consistent  with  the  content  of 
the  Continuous  Limit  Theorem  (Theorem  6.2.6). 

Exercise  6.2.3.  For  each  n  G  N  and  x  G  [0,  oo),  let 


9n(x)  = 


X 


and  hn(x)  = 


1  if  x  >  1/n 

nx  if  0  <  x  <  1/n. 


v  7  1  +  xn  [  nx  it  0  <  x  <  \ 

Answer  the  following  questions  for  the  sequences  (gn)  and  (hn) 

(a)  Find  the  pointwise  limit  on  [0,  oo). 

(b)  Explain  how  we  know  that  the  convergence  cannot  be  uniform  on  [0,  oo). 

(c)  Choose  a  smaller  set  over  which  the  convergence  is  uniform  and  supply  an 
argument  to  show  that  this  is  indeed  the  case. 
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Exercise  6.2.4.  Review  Exercise  5.2.8  which  includes  the  definition  for  a 
uniformly  differentiable  function.  Use  the  results  discussed  in  Section  6.2  to 
show  that  if  /  is  uniformly  differentiable,  then  f  is  continuous. 

Exercise  6.2.5.  Using  the  Cauchy  Criterion  for  convergent  sequences  of  real 
numbers  (Theorem  2.6.4),  supply  a  proof  for  Theorem  6.2.5.  (First,  define  a 
candidate  for  f[x\  and  then  argue  that  fn  f  uniformly.) 

Exercise  6.2.6.  Assume  fn  f  on  a  set  A.  Theorem  6.2.6  is  an  example 
of  a  typical  type  of  question  which  asks  whether  a  trait  possessed  by  each  fn 
is  inherited  by  the  limit  function.  Provide  an  example  to  show  that  all  of 
the  following  propositions  are  false  if  the  convergence  is  only  assumed  to  be 
pointwise  on  A.  Then  go  back  and  decide  which  are  true  under  the  stronger 
hypothesis  of  uniform  convergence. 

(a)  If  each  fn  is  uniformly  continuous,  then  /  is  uniformly  continuous. 

(b)  If  each  fn  is  bounded,  then  /  is  bounded. 

(c)  If  each  fn  has  a  finite  number  of  discontinuities,  then  /  has  a  finite  number 
of  discontinuities. 

(d)  If  each  fn  has  fewer  than  M  discontinuities  (where  M  £  N  is  fixed),  then 
/  has  fewer  than  M  discontinuities. 

(e)  If  each  fn  has  at  most  a  countable  number  of  discontinuities,  then  /  has 
at  most  a  countable  number  of  discontinuities. 


Exercise  6.2.7.  Let  /  be  uniformly  continuous  on  all  of  R,  and  define  a  seq¬ 
uence  of  functions  by  fn(x)  =  f(x  A  ^).  Show  that  fn  f  uniformly.  Give  an 
example  to  show  that  this  proposition  fails  if  /  is  only  assumed  to  be  continuous 
and  not  uniformly  continuous  on  R. 

Exercise  6.2.8.  Let  (gn)  be  a  sequence  of  continuous  functions  that  converges 
uniformly  to  g  on  a  compact  set  K.  If  g(pc)  ^  0  on  iL,  show  (1  / gn)  converges 
uniformly  on  K  to  1/g. 

Exercise  6.2.9.  Assume  (/n)  and  (gn)  are  uniformly  convergent  sequences  of 
functions. 


(a)  Show  that  (/n  -f  gn)  is  a  uniformly  convergent  sequence  of  functions. 

(b)  Give  an  example  to  show  that  the  product  (fn9n)  may  not  converge  uni¬ 
formly. 


Prove  that  if  there  exists  an  M  >  0  such  that  \fn 
all  n  £  N,  then  (fn9n)  does  converge  uniformly. 


<  M  and 


9n 


<  M  for 


Exercise  6.2.10.  This  exercise  and  the  next  explore  partial  converses  of  the 
Continuous  Limit  Theorem  (Theorem  6.2.6).  Assume  fn  f  pointwise  on  [a,  b] 
and  the  limit  function  /  is  continuous  on  [a,  b\.  If  each  fn  is  increasing  (but  not 
necessarily  continuous),  show  fn~^f  uniformly. 
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Exercise  6.2.11  (Dini’s  Theorem).  Assume  fn  f  pointwise  on  a  compact 
set  K  and  assume  that  for  each  x  E  K  the  sequence  fn(x)  is  increasing.  Follow 
these  steps  to  show  that  if  fn  and  /  are  continuous  on  IF,  then  the  convergence 
is  uniform. 

(a)  Set  gn  —  f  —  fn  and  translate  the  preceding  hypothesis  into  statements 
about  the  sequence  (gn). 

(b)  Let  c  G  0  be  arbitrary,  and  define  Kn  —  { x  G  K  ;  fj ,,  ( x )  c  | .  Argue  that 

K i  D  K2  D  Ks  A  •  •  • ,  and  use  this  observation  to  finish  the  argument. 

Exercise  6.2.12  (Cantor  Function).  Review  the  construction  of  the  Cantor 
set  C  C  [0, 1]  from  Section  3.1.  This  exercise  makes  use  of  results  and  notation 
from  this  discussion. 

(a)  Define  fo(x)  =  x  for  all  x  G  [0, 1].  Now,  let 

(  (3/2)x  for  0  <  x  <  1/3 

fi(x)  =  l  1/2  forl/3<x<2/3 

[  (3/2)x-  1/2  for  2/3  <  x  <  1. 

Sketch  /o  and  /i  over  [0, 1]  and  observe  that  /i  is  continuous,  increasing, 
and  constant  on  the  middle  third  (1/3,  2/3)  =  [0,  l]\Ci. 

(b)  Construct  by  imitating  this  process  of  flattening  out  the  middle  third 

of  each  nonconstant  segment  of  f\.  Specifically,  let 

(  (l/2)/i(3x)  for  0  <  x  <  1/3 

f2(x)  =  l  fi(x)  for  1/3  <  x  <  2/3 

[  (l/2)/i(3x  -  2)  +  1/2  for  2/3  <  x  <  1. 

If  we  continue  this  process,  show  that  the  resulting  sequence  (/n)  converges 
uniformly  on  [0,1]. 

(c)  Let  /  =  lim  fn.  Prove  that  /  is  a  continuous,  increasing  function  on  [0, 1] 
with  /( 0)  =0  and  /( 1)  =  1  that  satisfies  f'(x)  =  0  for  all  x  in  the  open 
set  [0, 1]\C.  Recall  that  the  “length”  of  the  Cantor  set  C  is  0.  Somehow, 
/  manages  to  increase  from  0  to  1  while  remaining  constant  on  a  set  of 
“length  1.” 

Exercise  6.2.13.  Recall  that  the  Bolzano- Weierstrass  Theorem  (Theorem 
2.5.5)  states  that  every  bounded  sequence  of  real  numbers  has  a  convergent 
subsequence.  An  analogous  statement  for  bounded  sequences  of  functions  is  not 
true  in  general,  but  under  stronger  hypotheses  several  different  conclusions  are 
possible.  One  avenue  is  to  assume  the  common  domain  for  all  of  the  functions 
in  the  sequence  is  countable.  (Another  is  explored  in  the  next  two  exercises.) 

Let  A  =  {xi,  X2,  X3, . . .}  be  a  countable  set.  For  each  n  G  N,  let  fn  be 
defined  on  A  and  assume  there  exists  an  M  >  0  such  that  \fn(x)\  <  M  for  all 
n  G  N  and  x  G  A.  Follow  these  steps  to  show  that  there  exists  a  subsequence 
of  (fn)  that  converges  pointwise  on  A. 


6.2.  Uniform  Convergence  of  a  Sequence  of  Functions 


183 


(a)  Why  does  the  sequence  of  real  numbers  fn(x  1)  necessarily  contain  a  con¬ 
vergent  subsequence  (/nfc)?  To  indicate  that  the  subsequence  of  functions 
(fnk)  is  generated  by  considering  the  values  of  the  functions  at  aq,  we  will 
use  the  notation  fnk  =  fig. 

(b)  Now,  explain  why  the  sequence  fig(x2)  contains  a  convergent  subsequence. 

(c)  Carefully  construct  a  nested  family  of  subsequences  (fm,h c)>  and  show  how 
this  can  be  used  to  produce  a  single  subsequence  of  (/n)  that  converges 
at  every  point  of  A. 


Exercise  6.2.14.  A  sequence  of  functions  (fn)  defined  on  a  set  E  C  R  is  called 
eQui continuous  if  for  every  e  0  there  exists  a  S  0  such  that  |  f^i^x^j  fn(y)  I  <  e 


for  all  n  £  N  and 


x 


V 


<  5  in  E. 


(a)  What  is  the  difference  between  saying  that  a  sequence  of  functions  (/n)  is 
equicontinuous  and  just  asserting  that  each  fn  in  the  sequence  is  individ¬ 
ually  uniformly  continuous? 

(b)  Give  a  qualitative  explanation  for  why  the  sequence  gn{%)  =  xn  is  not 
equicontinuous  on  [0, 1].  Is  each  gn  uniformly  continuous  on  [0, 1]? 


Exercise  6.2.15  (Arzela— Ascoli  Theorem).  For  each  n  £  N,  let  fn  be  a 

function  defined  on  [0, 1].  If  (/n)  is  bounded  on  [0, 1] — that  is,  there  exists  an 
M  0  such  that  |/n(A)|  ^  AT  for  all  n  £  IN  and  x  £  [0, 1]  and  if  the  collection 
of  functions  (/n)  is  equicontinuous  (Exercise  6.2.14),  follow  these  steps  to  show 
that  (/n)  contains  a  uniformly  convergent  subsequence. 


(a)  Use  Exercise  6.2.13  to  produce  a  subsequence  (fnk)  that  converges  at  every 
rational  point  in  [0, 1].  To  simplify  the  notation,  set  gk  =  fnk  •  It  remains 
to  show  that  (g^)  converges  uniformly  on  all  of  [0, 1]. 


(b)  Let  c  0.  By  equicont muity,  there  exists  a  S  0  such  that 

1 9k(x)  -  gk(y) 


e 

<3 


for  all 


x 


y\  <  S  and  k  £  N.  Using  this  S,  let  rq,  r2, . . . ,  rm  be  a 


finite  collection  of  rational  points  with  the  property  that  the  union  of 
the  neighborhoods  W(u)  contains  [0,1]. 

Explain  why  there  must  exist  an  N  £  N  such  that 


9s(n)  -gt(n) 


< 


e 

3 


for  all  >  N  and  in  the  finite  subset  of  [0,1]  just  described.  Why 
does  having  the  set  {7*1,  7*2, . . . ,  rm}  be  finite  matter? 

(c)  Finish  the  argument  by  showing  that,  for  an  arbitrary  x  £  [0, 1], 


9s(x)  ~9t{x) 


<  e 


for  all  5,  t  >  N . 
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6.3  Uniform  Convergence  and  Differentiation 

Example  6.2.2  (iii)  imposes  some  significant  restrictions  on  what  we  might  hope 
to  be  true  regarding  differentiation  and  uniform  convergence.  If  hn  — )►  h  uni¬ 
formly  and  each  hn  is  differentiable,  we  should  not  anticipate  that  h'n  h' 
because  in  this  example  h!  (x)  does  not  even  exist  at  x  =  0.  There  are  also 
examples  (see  Exercise  6.3.4)  where  fn  f  uniformly  with  (/n)  and  /  all 
differentiable,  but  the  sequence  (//)  diverges  at  every  point  of  the  domain. 

The  key  assumption  necessary  to  be  able  to  prove  any  facts  about  the 
derivative  of  the  limit  function  is  that  the  sequence  of  derivatives  be  uniformly 
convergent.  This  may  sound  as  though  we  are  assuming  what  it  is  we  would 
like  to  prove,  and  there  is  some  validity  to  this  complaint.  The  more  hypotheses 
a  proposition  has,  the  more  difficult  it  is  to  apply.  The  content  of  the  next 
theorem  is  that  if  we  are  given  a  pointwise  convergent  sequence  of  differentiable 
functions,  and  if  we  know  that  the  sequence  of  derivatives  converges  uniformly 
to  something ,  then  the  limit  of  the  derivatives  is  indeed  the  derivative  of  the 
limit. 


Theorem  6.3.1  (Differentiable  Limit  Theorem).  Let  fn  f  pointwise 
on  the  closed  interval  [a,  b\,  and  assume  that  each  fn  is  differentiable.  If  (//) 
converges  uniformly  on  [a,  b }  to  a  function  g,  then  the  function  f  is  differentiable 
and  f  =  g. 


Proof.  Fix  c  G  [a,  b]  and  let  e  >  0.  We  want  to  argue  that  /'(c)  exists  and  equals 
g(c).  Because  f  is  defined  by  the  limit 


/'(c)  =  lim 

X^rC 


f(x)  ~  /(c) 
x  —  c 


our  task  is  to  produce  a  S  >  0  so  that 


/ 0)  -  /(c) 

x  —  c 


5(c) 


<  e 


whenever  0  <  \x  —  c\  <  S. 

To  motivate  the  strategy  of  the  proof,  observe  that  for  all  x  c  and  all 
n  G  N,  the  triangle  inequality  implies 


f(x)  -  /(c) 


x  —  c 


5(c) 


< 


f(X)~f(C)  fn(x)  ~  fn(c) 


+ 


x  —  c  x  —  c 

fn(x)  ~  fn(c) 


x  —  c 


-  /n(c) 


+  l/n(c)  -  5(c) 


Our  intent  is  to  first  find  an  fn  that  forces  the  first  and  third  terms  on  the 
right-hand  side  to  be  less  than  e/3.  Once  we  establish  which  fn  we  want,  we 
can  then  use  the  differentiability  of  fn  to  produce  a  S  that  makes  the  middle 
term  less  than  e/3  for  all  x  satisfying  0  <  \x  —  c\  <  8. 
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Let’s  start  by  choosing  an  N\  such  that 

(1)  l/m(c)-S(c)|<| 

for  all  m  >  N\.  We  now  invoke  the  uniform  convergence  of  (/^)  to  assert  (via 
Theorem  6.2.5)  that  there  exists  an  N2  such  that  m,  n  >  N2  implies 

1/mW  -  fn(x)\  <  |  for  a11  x  e  \aM' 

Set  N  =  max  {Ah,  N2}. 

The  function  fjy  is  differentiable  at  c,  and  so  there  exists  a  S  >  0  for  which 


(2) 


Jn  (x)  -  Jn  (c) 


x  —  c 


/at(c) 


e 

K  3 


whenever  0  <  \x  —  c\  <  S.  This  is  our  sought  after  S,  but  it  takes  some  effort  to 
show  that  it  has  the  desired  property. 

Fix  an  x  satisfying  0  <  \x  —  c\  <  S,  let  m  >  V,  and  apply  the  Mean  Value 
Theorem  to  fm  —  on  the  interval  [c,  x\ ,  (if  x  <  c  the  argument  is  the  same.) 
By  MVT,  there  exists  an  a  E  (c,  x)  such  that 


/mO)  -  fw  (a)  = 


(fm(x)  ~  f N{X ))  -  (/m(c)  -  /jv(c)) 


x  —  c 


Recall  that  our  choice  of  N  implies 


l/mOO  -  <  g* 


and  so  it  follows  that 


fm(x)-fm(c)  fN(x)-fN(c) 


x  —  c 


x  —  c 


e 

<3 


Because  fm—tf  we  can  take  the  limit  as  m  oo,  and  the  Order  Limit  Theorem 
(Theorem  2.3.4)  asserts  that 


(3) 


f(x)  -  f(c)  Jn(x)  -  fN(c ) 


x  —  c 


x  —  c 


e 

<  -. 

~  3 


Finally,  the  inequalities  in  (1),  (2),  and  (3),  together  imply  that  for  x  satisfying 
0  <  I x  —  c  <  <5, 


f(x)  -  /(c) 


x  —  c 


-9(c) 


< 


< 


f  (x)  -  f(c)  fN(x)  -  f  N  (c) 


+ 


x  —  c  x  —  c 

/n(x)  -  f n(c ) 


e  e  e 
3  +  3  +  3 


x  —  c 
-  e. 


/at(c) 


+  I  f'N(c)-g(c) 


□ 
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The  hypothesis  in  the  Differentiable  Limit  Theorem  is  unnecessarily  strong. 
We  actually  do  not  need  to  assume  that  fn(x)  —>  f(x)  at  each  point  in  the 
domain  because  the  assumption  that  the  sequence  of  derivatives  (//)  converges 
uniformly  is  nearly  strong  enough  to  prove  that  (/n)  converges,  uniformly  in 
fact.  Two  functions  with  the  same  derivative  may  differ  by  a  constant,  so  we 
must  assume  that  there  is  at  least  one  point  xq  where  fn(xo)  /(To)- 


Theorem  6.3.2.  Let  (fn)  be  a  sequence  of  differentiable  functions  defined  on 
the  closed  interval  [a,  b],  and  assume  (//)  converges  uniformly  on  [a,  b\.  If  there 
exists  a  point  xo  E  [a,  b]  where  fn(x o)  is  convergent,  then  (/n)  converges  uni¬ 
formly  on  [a,b\. 


Proof.  Exercise  6.3.7. 


□ 


Combining  the  last  two  results  produces  a  stronger  version  of  Theorem  6.3.1 


Theorem  6.3.3.  Let  (/n)  be  a  sequence  of  differentiable  functions  defined  on 
the  closed  interval  [a,  b\,  and  assume  (//)  converges  uniformly  to  a  function  g  on 
a,  b\.  If  there  exists  a  point  xq  E  [a,  b]  for  which  fn(x o)  is  convergent,  then  (/n) 
converges  uniformly.  Moreover,  the  limit  function  f  =  lim  fn  is  differentiable 
and  satisfies  f  =  g. 


Exercises 

Exercise  6.3.1.  Consider  the  sequence  of  functions  defined  by 

rn 
n 

(a)  Show  (gn)  converges  uniformly  on  [0, 1]  and  find  g  =  lim gn.  Show  that  g 
is  differentiable  and  compute  gr(x)  for  all  x  E  [0,1]. 

(b)  Now,  show  that  (g'n)  converges  on  [0, 1].  Is  the  convergence  uniform?  Set 
h  =  limg^  and  compare  h  and  g' .  Are  they  the  same? 

Exercise  6.3.2.  Consider  the  sequence  of  functions 


(a)  Compute  the  pointwise  limit  of  (hn)  and  then  prove  that  the  convergence 
is  uniform  on  R. 

(b)  Note  that  each  hn  is  differentiable.  Show  g(x)  =  lim h'n(x)  exists  for  all 
x ,  and  explain  how  we  can  be  certain  that  the  convergence  is  not  uniform 
on  any  neighborhood  of  zero. 

Exercise  6.3.3.  Consider  the  sequence  of  functions 

fn  0)  = 


1  +  nx2 
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(a)  Find  the  points  on  R  where  each  fn(x)  attains  its  maximum  and  minimum 
value.  Use  this  to  prove  (/n)  converges  uniformly  on  R.  What  is  the  limit 
function? 


(b)  Let  /  =  lim/n.  Compute  f^(x)  and  find  all  the  values  of  x  for  which 
f(x)  =  lim  f'n(x). 

Exercise  6.3.4.  Let 

sin  (nx) 


Show  that  hn  0  uniformly  on  R  but  that  the  sequence  of  derivatives  (h!n) 
diverges  for  every  x  G  R. 


Exercise  6.3.5.  Let 

nx  +  x2 
2 n  ’ 

and  set  g(x)  =  lim gn(x).  Show  that  g  is  differentiable  in  two  ways: 

(a)  Compute  g(x)  by  algebraically  taking  the  limit  as  n  oo  and  then 
find  g'(x). 


(b)  Compute  g'n(x)  for  each  n  G  N  and  show  that  the  sequence  of  derivatives 
(g'n)  converges  uniformly  on  every  interval  [— M,  M].  Use  Theorem  6.3.3 
to  conclude  g'(x)  =  lim g'n(x). 


(c)  Repeat  parts  (a)  and  (b)  for  the  sequence  fn(x)  =  ( nx 2  +  1 ) / (2 n  +  x). 


Exercise  6.3.6.  Provide  an  example  or  explain  why  the  request  is  impossible. 
Let’s  take  the  domain  of  the  functions  to  be  all  of  R. 


(a)  A  sequence  (/n)  of  nowhere  differentiable  functions  with  fn—>f  uniformly 
and  /  everywhere  differentiable. 


(b)  A  sequence  (/n)  of  differentiable  functions  such  that  (/^)  converges  uni¬ 
formly  but  the  original  sequence  (fn)  does  not  converge  for  any  x  G  R. 

(c)  A  sequence  (/n)  of  differentiable  functions  such  that  both  (/n)  and  (/^) 
converge  uniformly  but  /  =  lim  fn  is  not  differentiable  at  some  point. 


Exercise  6.3.7.  Use  the  Mean  Value  Theorem  to  supply  a  proof  for  Theo¬ 
rem  6.3.2.  To  get  started,  observe  that  the  triangle  inequality  implies  that,  for 
any  x  G  [a,  b }  and  m,  n  G  N, 


fn(x)  ~  fm(x)  |  <  \(fn(x)  ~  fm(x))  ~  (fn(x  o)  ~  fm(x  o))|  +  \fn(xo)  ~  fm(x  o) 
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6.4  Series  of  Functions 

Definition  6.4.1.  For  each  n  E  N,  let  fn  and  /  be  functions  defined  on  a  set 
A  C  R.  The  infinite  series 

oo 

T  /«(x)  =  AW  +  h(x)  +  h(x)-} — 

n=l 

converges  pointwise  on  A  to  f(x)  if  the  sequence  sk(x)  of  partial  sums  defined  by 

Sk(x)  =  fi(x)  +  f2(x)  4 - h  fk( x) 

converges  pointwise  to  f(x).  The  series  converges  uniformly  on  A  to  /  if  the 
sequence  sk(x)  converges  uniformly  on  A  to  f(x). 

In  either  case,  we  write  /  =  T,n=i  fn  or  f(x)  =  T,n=i  fn(x),  always  being 
explicit  about  the  type  of  convergence  involved. 

If  we  have  a  series  fn  where  the  functions  fn  are  continuous,  then 

the  Algebraic  Continuity  Theorem  (Theorem  4.3.4)  guarantees  that  the  partial 
sums — because  they  are  finite  sums — will  be  continuous  as  well.  A  correspond¬ 
ing  observation  is  true  if  we  are  dealing  with  differentiable  functions.  As  a 
consequence,  we  can  immediately  translate  the  results  for  sequences  in  the  pre¬ 
vious  sections  into  statements  about  the  behavior  of  infinite  series  of  functions. 

Theorem  6.4.2  (Term-by-term  Continuity  Theorem).  Let  fn  be  continu¬ 
ous  functions  defined  on  a  set  A  C  R7  and  assume  fn  converges  uniformly 

on  A  to  a  function  f .  Then,  f  is  continuous  on  A. 

Proof.  Apply  the  Continuous  Limit  Theorem  (Theorem  6.2.6)  to  the  partial 
sums  sk  =  fi  T  /2  4 - h  fk-  □ 

Theorem  6.4.3  (Term-by-term  Differentiability  Theorem).  Let  fn  be 

differentiable  functions  defined  on  an  interval  A,  and  assume  fn(x )  con~ 

verges  uniformly  to  a  limit  g{x)  on  A.  If  there  exists  a  point  xo  E  [a,  b]  where 
fn {.xo )  converges,  then  the  series  fn(x)  converges  uniformly  to  a 

differentiable  function  f{x)  satisfying  f'(x)  =  g(x)  on  A.  In  other  words, 

oo  oo 

f(x)  =  Ti  fn(x)  and  f(x)  =  ^2  fn(x)- 

n— 1  n— 1 

Proof.  Apply  the  stronger  form  of  the  Differentiable  Limit  Theorem  (Theorem 
6.3.3)  to  the  partial  sums  sk  =  fi  +  f<i  +  •  •  •  +  fk.  Observe  that  Theorem  5.2.4 
implies  that  s'k  =  f[  +  f2  H - h  f'k.  □ 

In  the  vocabulary  of  infinite  series,  the  Cauchy  Criterion  takes  the  following 
form. 
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Theorem  6.4.4  (Cauchy  Criterion  for  Uniform  Convergence  of  Series). 

A  series  Y^=i  fn  converges  uniformly  on  A  C  R  if  and  only  if  for  every  e  >  0 
there  exists  an  N  E  N  such  that 


fm+  l{x)  +  /ra+2 {%)  +  fm+?>(x)  +  '  '  ’  +  fn{%) 


<  e 


whenever  n  >  m  >  N  and  x  E  A. 


The  benefits  of  uniform  convergence  over  pointwise  convergence  suggest  the 
need  for  some  ways  of  determining  when  a  series  converges  uniformly.  The  fol¬ 
lowing  corollary  to  the  Cauchy  Criterion  is  the  most  common  such  tool.  In 
particular,  it  will  be  quite  useful  in  our  upcoming  investigations  of  power  series. 

Corollary  6.4.5  (Weierstrass  M-Test).  For  each  n  G  N,  let  fn  be  a  function 
defined  on  a  set  A  C  R,  and  let  Mn  >  0  be  a  real  number  satisfying 


fn(x)  |  <  Mn 


for  all  x  G  A.  If  Yin °=i  Mn  converges,  then  fn  converges  uniformly  on  A. 

Proof.  Exercise  6.4.1.  □ 


Exercises 

Exercise  6.4.1.  Supply  the  details  for  the  proof  of  the  Weierstrass  M-Test 
(Corollary  6.4.5). 

Exercise  6.4.2.  Decide  whether  each  proposition  is  true  or  false,  providing  a 
short  justification  or  counterexample  as  appropriate. 

(a)  If  _xgn  converges  uniformly,  then  (gn)  converges  uniformly  to  zero. 

(b)  If  0  <  fn(x)  <  gn(x)  and  Y.n=i9n  converges  uniformly,  then  J2^=ifn 
converges  uniformly. 

(c)  If  1  fn  converges  uniformly  on  A ,  then  there  exist  constants  Mn  such 
that  \fn(x)\  <  Mn  for  all  x  G  A  and  YllLi  Mn  converges. 

Exercise  6.4.3.  (a)  Show  that 


oo 


g(x )  =  E 


cos(2nx) 


)n 


n— 0 


is  continuous  on  all  of  R. 

(b)  The  function  g  was  cited  in  Section  5.4  as  an  example  of  a  continuous 
nowhere  differentiable  function.  What  happens  if  we  try  to  use  Theorem 
6.4.3  to  explore  whether  g  is  differentiable? 
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Exercise  6.4.4.  Define 


oo 


9(x)  =  Y 


X 


2  n 


n— 0 


(1+X2n) 


Find  the  values  of  x  where  the  series  converges  and  show  that  we  get  a  continuous 
function  on  this  set. 

Exercise  6.4.5.  (a)  Prove  that 


OO  rp% 


n= 1 


944 

rp "  rp  ^  rp  ^ 

th  t/y  tly 

+  —  + 


4 


9  16 


+ 


is  continuous  on  [—1,1 


(b)  The  series 


OO  rp 

f{x)  =  j2-  =  x+ 

z — '  n 


n= 1 


2  4  4 

rp^  rp*-9  rp “ 

tly  tly 

y+y+T+ 


converges  for  every  x  in  the  half-open  interval  [—1,1)  but  does  not  converge 
when  x  =  1.  For  a  fixed  Xq  G  (—1,1),  explain  how  we  can  still  use  the 
Weierstrass  M-Test  to  prove  that  /  is  continuous  at  xq. 


Exercise  6.4.6.  Let 


,  N  1  1  1 

f(x)  = - — r  + 


1  1 

+ 


x  x  +  1  x  +  2  x  +  3  x  +  4 

Show  /  is  defined  for  all  x  >  0.  Is  /  continuous  on  (0,  oo)?  How  about 
differentiable? 


Exercise  6.4.7.  Let 


oo 


fix)  =  Y 


k= 1 


sin  (/ex) 

k3 


(a)  Show  that  f(x)  is  differentiable  and  that  the  derivative  f'[x )  is  continuous. 

(b)  Can  we  determine  if  /  is  twice-differentiable? 


Exercise  6.4.8.  Consider  the  function 


oo 


fix )  =  E 


fc=l 


sin(rr/fc) 

k 


Where  is  /  defined?  Continuous?  Differentiable?  Twice-differentiable? 


oo 


Kx)  =  Y 


1 


n= 1 


x2  +  n2 


Exercise  6.4.9.  Let 
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(a)  Show  that  h  is  a  continuous  function  defined  on  all  of  R. 

(b)  Is  h  differentiable?  If  so,  is  the  derivative  function  hf  continuous? 

Exercise  6.4.10.  Let  {ri,r2,r3, . . .}  be  an  enumeration  of  the  set  of  rational 
numbers.  For  each  rn  E  Q,  define 


f  l/2n  for  x  >  rn 
\  0  for  x  <rn. 


Now,  let  h(x)  =  Y^=i  un(x).  Prove  that  h  is  a  monotone  function  defined  on 
all  of  R  that  is  continuous  at  every  irrational  point. 


6.5  Power  Series 


It  is  time  to  put  some  mathematical  teeth  into  our  understanding  of  functions 
expressed  in  the  form  of  a  power  series;  that  is,  functions  of  the  form 


oo 

f(x)  =  anxn  =  ao  +  a\x  +  CL2X2  +  a^x3  +  •  •  •  . 

n= 0 


The  first  order  of  business  is  to  determine  the  points  x  E  R  for  which  the 
resulting  series  on  the  right-hand  side  converges.  This  set  certainly  contains 
x  =  0,  and,  as  the  next  result  demonstrates,  it  takes  a  very  predictable  form. 


Theorem  6.5.1.  If  a  power  series  J2^Lo  an%n  converges  at  some  point  xo  E  R, 
then  it  converges  absolutely  for  any  x  satisfying  \x\  <  \xo  . 

Proof.  If  anxo  converges,  then  the  sequence  of  terms  (anXo)  is  bounded. 

(In  fact,  it  converges  to  0.)  Let  M  >  0  satisfy  \anXQ  \  <  M  for  all  n  E  N.  If 


x  E  R  satisfies 


x 


< 


Xo 


then 


anxn 

a„x  o 

X 

n 

X 

— 

<  M 

Xo 

Xo 

n 


But  notice  that 


EM 

n— 0 


X 


n 


Xo 


is  a  geometric  series  with  ratio  |x/xo|  <  1  and  so  converges.  By  the  Comparison 
Test>  Er=o a  nxn  converges  absolutely.  □ 


The  main  implication  of  Theorem  6.5.1  is  that  the  set  of  points  for  which  a 
given  power  series  converges  must  necessarily  be  {0},  R,  or  a  bounded  interval 
centered  around  x  =  0.  Because  of  the  strict  inequality  in  Theorem  6.5.1,  there 
is  some  ambiguity  about  the  endpoints  of  the  interval,  and  it  is  possible  that 
the  set  of  convergent  points  may  be  of  the  form  (— R,  R),  [— R,  R),  (— R,  R],  or 
-R,  R] . 
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The  value  of  R  is  referred  to  as  the  radius  of  convergence  of  a  power  series, 
and  it  is  customary  to  assign  R  the  value  0  or  oo  to  represent  the  set  {0} 
or  R,  respectively.  Some  of  the  standard  devices  for  computing  the  radius  of 
convergence  for  a  power  series  are  explored  in  the  exercises.  Of  more  interest 
to  us  here  is  the  investigation  of  the  properties  of  functions  defined  in  this  way. 
Are  they  continuous?  Are  they  differentiable?  If  so,  can  we  differentiate  the 
series  term-by-term?  What  happens  at  the  endpoints? 


Establishing  Uniform  Convergence 

The  positive  answers  to  the  preceding  questions,  and  the  usefulness  of  power 
series  in  general,  are  largely  due  to  the  fact  that  they  converge  uniformly  on 
compact  sets  contained  in  their  domain  of  convergent  points.  As  we  are  about  to 
see,  a  complete  proof  of  this  fact  requires  a  fairly  delicate  argument  attributed 
to  the  Norwegian  mathematician  Niels  Henrik  Abel.  A  significant  amount  of 
progress,  however,  can  be  made  with  the  Weierstrass  M-Test  (Corollary  6.4.5). 


Theorem  6.5.2.  If  a  power  series  an%n  converges  absolutely  at  a  point 

xq,  then  it  converges  uniformly  on  the  closed  interval  [— c,  c],  where  c  =  |xq|. 


Proof.  This  proof  requires  a  straightforward  application  of  the  Weierstrass 
M-Test.  The  details  are  requested  in  Exercise  6.5.3.  □ 


For  many  applications,  Theorem  6.5.2  is  good  enough.  For  instance,  be¬ 
cause  any  x  E  (— i?,  R)  is  contained  in  the  interior  of  a  closed  interval  [— c,  c]  C 
(— i?,  i?),  it  now  follows  that  a  power  series  that  converges  on  an  open  interval 
is  necessarily  continuous  on  this  interval. 

But  what  happens  if  we  know  that  a  series  converges  at  an  endpoint  of 
its  interval  of  convergence?  Does  the  good  behavior  of  the  series  on  (— R,  R) 
necessarily  extend  to  the  endpoint  x  =  R?  If  the  convergence  of  the  series  at 
x  =  R  is  absolute  convergence,  then  we  can  again  rely  on  Theorem  6.5.2  to 
conclude  that  the  series  converges  uniformly  on  the  set  [— i?,  R\.  The  remaining 
interesting  open  question  is  what  happens  if  a  series  converges  conditionally 
at  a  point  x  =  R.  We  may  still  use  Theorem  6.5.1  to  conclude  that  we  have 
pointwise  convergence  on  the  interval  (— R,  R],  but  more  work  is  needed  to 
establish  uniform  convergence  on  compact  sets  containing  x  =  R. 


Abel’s  Theorem 


We  should  remark  that  if  the  power  series  g(x)  =  anXn  converges  con¬ 

ditionally  at  x  =  R,  then  it  is  possible  for  it  to  diverge  when  x  =  —  R.  The 
series 


E 


(■ -l)nxn 
n 


with  R  =  1  is  an  example.  To  keep  our  attention  fixed  on  the  convergent 
endpoint,  we  will  prove  uniform  convergence  on  the  set  [0,  R\. 
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The  first  step  in  the  argument  is  an  estimate  that  should  be  compared  to 
Abel’s  Test  for  convergence  of  series,  developed  back  in  Chapter  2  (Exercise 
2.7.13). 

Lemma  6.5.3  (Abel’s  Lemma).  Let  bn  satisfy  b\  >  62  >  63  >  •  •  •  >  0,  and 
let  T,^=l  «n  be  a  series  for  which  the  partial  sums  are  bounded.  In  other  words , 
assume  there  exists  A  >  0  such  that 


cli  T  &2  T  *  *  *  T 


<  A 


for  all  n  £  N.  Then,  for  all  n  £  N, 


aibi  +  Q-2^2  T  0-363  +  •  •  •  +  anbn 


<  Ab\. 


Proof.  Let  sn  =  a\  +  <22  +  •  •  •  +  an.  Using  the  summation- by-parts  formula 
derived  in  Exercise  2.7.12,  we  can  write 


n 

E  ak^k 

k= 1 


n 

^n^n+1  T  ^  ^  $k  (pk  ^fc+l) 

k= 1 


< 


n 

Abn+i  T  ^  ^  A(bk  bk+i) 

k= 1 


+  (A61  —  A6n+i)  —  Ab\. 


□ 


It  is  worth  observing  that  if  A  were  an  upper  bound  on  the  partial  sums 
of  \an\  (note  the  absolute  value  bars),  then  the  proof  of  Lemma  6.5.3  would 
be  a  simple  exercise  in  the  triangle  inequality.  The  point  of  the  matter  is  that 
because  we  are  only  assuming  conditional  convergence,  the  triangle  inequality 
is  not  going  to  be  of  any  use  in  proving  Abel’s  Theorem,  but  we  are  now  in 
possession  of  an  inequality  that  we  can  use  in  its  place. 

Theorem  6.5.4  (Abel’s  Theorem).  Let  g{x)  =  E„= 0  a  nxn  be  a  power  series 
that  converges  at  the  point  x  =  R  >  0.  Then  the  series  converges  uniformly  on 
the  interval  [0,  R\ .  A  similar  result  holds  if  the  series  converges  at  x  =  —R. 

Proof.  To  set  the  stage  for  an  application  of  Lemma  6.5.3,  we  first  write 

OO  OO 

(nr*  \  77/ 

r)  ' 

n— 0  n= 0 

Let  e  >  0.  By  the  Cauchy  Criterion  for  Uniform  Convergence  of  Series  (Theorem 
6.4.4),  we  will  be  done  if  we  can  produce  an  N  such  that  n  >  m  >  N  implies 


(7*  \  m+1 

+  (am+2i?m+2) 


m+ 2 


<  e. 
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Because  we  are  assuming  that  anRn  converges,  the  Cauchy  Criterion  for 

convergent  series  of  real  numbers  guarantees  that  there  exists  an  N  such  that 


‘  ‘  +  ClnRn 


whenever  n  >  m  >  N.  But  now,  for  any  fixed  m  £  N,  we  can  apply  Abel’s 
Lemma  (Lemma  6.5.3)  to  the  sequences  obtained  by  omitting  the  first  m  terms. 
Using  e/2  as  a  bound  on  the  partial  sums  of  am-\-jRm+^  and  observing  that 
(, x/R )mL-f  is  monotone  decreasing,  an  application  of  Abel’s  Lemma  to  equation 
(1)  yields 


(am+iRm+1) 


+  (am+2R^)  Qm+\  ■  ■  ■ 


Mr 


m+1 

<  e. 


□ 


The  Success  of  Power  Series 

An  economical  way  to  summarize  the  conclusions  of  Theorem  6.5.2  and  Abel’s 
Theorem  is  with  the  following  statement. 

Theorem  6.5.5.  If  a  power  series  converges  pointwise  on  the  set  iCR,  then 
it  converges  uniformly  on  any  compact  set  K  C  A. 

Proof.  A  compact  set  contains  both  a  maximum  x\  and  a  minimum  xo,  which  by 
hypothesis  must  be  in  A.  Abel’s  Theorem  implies  the  series  converges  uniformly 
on  the  interval  [xq,x\]  and  thus  also  on  K.  □ 

This  fact  leads  to  the  desirable  conclusion  that  a  power  series  is  continuous 
at  every  point  at  which  it  converges.  To  make  an  argument  for  differentia¬ 
bility,  we  would  like  to  appeal  to  Theorem  6.4.3;  however,  this  result  has  a 
slightly  more  involved  set  of  hypotheses.  In  order  to  conclude  that  a  power 
series  an%n  is  differentiable,  and  that  term-by-term  differentiation  is  al¬ 
lowed,  we  need  to  know  beforehand  that  the  differentiated  series  r^anxn~1 

converges  uniformly. 

Theorem  6.5.6.  If  an%n  converges  for  all  x  £  (—R,R),  then  the  differ¬ 
entiated  series  Y^=i  converges  at  each  x  £  (—R,R)  as  well.  Conse¬ 

quently,  the  convergence  is  uniform  on  compact  sets  contained  in  (—R,R). 

Proof.  Exercise  6.5.5.  □ 

We  should  point  out  that  it  is  possible  for  a  series  to  converge  at  an  end¬ 
point  x  =  R  but  for  the  differentiated  series  to  diverge  at  this  point.  The 
series  xn/n  has  this  property  when  x  =  —1.  On  the  other  hand,  if  the 

differentiated  series  does  converge  at  the  point  x  =  R,  then  Abel’s  Theorem 
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applies  and  the  convergence  of  the  differentiated  series  is  uniform  on  compact 
sets  that  contain  R. 

With  all  the  pieces  in  place,  we  summarize  the  impressive  conclusions  of  this 
section. 

Theorem  6.5.7.  Assume 

oo 

f(x)  =  ^2  anxn 

n— 0 

converges  on  an  interval  A  C  R.  The  function  f  is  continuous  on  A  and 
differentiable  on  any  open  interval  (-R,  R)  C  A.  The  derivative  is  given  by 

oo 

f(x)  =  y 2nanxn~1. 

n= 1 

Moreover,  f  is  infinitely  differentiable  on  (—R,R),  and  the  successive  derivatives 
can  be  obtained  via  term-by-term  differentiation  of  the  appropriate  series. 

Proof.  The  details  for  why  /  is  continuous  have  been  discussed.  Theorem  6.5.6 
justifies  the  application  of  the  Term-by-term  Differentiability  Theorem  (Theorem 
6.4.3),  which  verifies  the  formula  for  f . 

A  differentiated  power  series  is  a  power  series  in  its  own  right,  and  Theorem 
6.5.6  implies  that,  although  the  series  may  no  longer  converge  at  a  particular 
endpoint,  the  radius  of  convergence  does  not  change.  By  induction,  then,  power 
series  are  differentiable  an  infinite  number  of  times.  □ 


Exercises 


Exercise  6.5.1.  Consider  the  function  g  defined  by  the  power  series 


(a)  Is  g  defined  on  (—1,1)?  Is  it  continuous  on  this  set?  Is  g  defined  on 
(—1,1]?  Is  it  continuous  on  this  set?  What  happens  on  [—1,1]?  Can 
the  power  series  for  g{pc)  possibly  converge  for  any  other  points  \x\  >  1? 
Explain. 

(b)  For  what  values  of  x  is  g'{x)  defined?  Find  a  formula  for  g' . 


Exercise  6.5.2.  Find  suitable  coefficients  (an)  so  that  the  resulting  power  series 
Un%n  has  the  given  properties,  or  explain  why  such  a  request  is  impossible. 

(a)  Converges  for  every  value  of  x  G  R. 


(b)  Diverges  for  every  value  of  x  G  R. 


(c)  Converges  absolutely  for  all  x  G 


—  1,1]  and  diverges  off  of  this  set. 
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(d)  Converges  conditionally  at  x  =  —  1  and  converges  absolutely  at  x  =  1. 

(e)  Converges  conditionally  at  both  x  =  —  1  and  x  =  1. 

Exercise  6.5.3.  Use  the  Weierstrass  M-Test  to  prove  Theorem  6.5.2. 

Exercise  6.5.4  (Term-by-term  Antidifferentiation).  Assume  f(x)  = 
an%n  converges  on  (-R,  R). 

(a)  Show 


oo 

F(x)  = 


n— 0 


an  xn+ 1 
71+1 


is  defined  on  (—R,R)  and  satisfies  F'(x)  =  f(x). 


(b)  Antiderivatives  are  not  unique.  If  g  is  an  arbitrary  function  satisfying 
g'(x)  =  f(x)  on  (— R,  R),  find  a  power  series  representation  for  g. 

Exercise  6.5.5.  (a)  If  s  satisfies  0  <  s  <  1,  show  nsn_1  is  bounded  for 

all  n  >  1. 


(b)  Given  an  arbitrary  x  E  (— R,  R),  pick  t  to  satisfy 
start  to  construct  a  proof  for  Theorem  6.5.6. 


x 


<  t  <  R.  Use  this 


Exercise  6.5.6.  Previous  work  on  geometric  series  (Example  2.7.5)  justifies 
the  formula 


1 


1 


1  +  X  +  X2  +  X3  +  x4  + 


for  all 


x 


X 


<  1 


Use  the  results  about  power  series  proved  in  this  section  to  find  values  for 
n/2n  and  ^2/2n.  The  discussion  in  Section  6.1  may  be  helpful. 

Exercise  6.5.7.  Let  an%n  be  a  power  series  with  an  ^  0,  and  assume 


L  =  lim 

n— t>oo 


C++ 1 


a 


n 


exists. 


(a)  Show  that  if  L  ^  0,  then  the  series  converges  for  all  x  in  (— 1/L,  1/L) 
(The  advice  in  Exercise  2.7.9  may  be  helpful.) 

(b)  Show  that  if  L  =  0,  then  the  series  converges  for  all  x  E  R. 

(c)  Show  that  (a)  and  (b)  continue  to  hold  if  L  is  replaced  by  the  limit 


L'  =  lim  sn  where  sn  =  sup 


n— oo 


C++ 1 

c+ 


:  k  >  n 


(General  properties  of  the  limit  superior  are  discussed  in  Exercise  2.4.7.) 
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Exercise  6.5.8.  (a)  Show  that  power  series  representations  are  unique.  If 

we  have 

oo  oo 

Y  anxn  =  Y  bnXn 

n— 0  n— 0 

for  all  x  in  an  interval  (— R,  R),  prove  that  an  =  bn  for  all  n  —  0,1,2,.... 

(b)  Let  f(x)  =  ^2^=0  an%n  converge  on  (— R,  R),  and  assume  f'(x)  =  f(x) 
for  all  x  G  (—R,R)  and  /( 0)  =  1.  Deduce  the  values  of  an. 

Exercise  6.5.9.  Review  the  definitions  and  results  from  Section  2.8  concerning 
products  of  series  and  Cauchy  products  in  particular.  At  the  end  of  Section  2.9, 
we  mentioned  the  following  result:  If  both  ]C  an  and  ^2  b n  converge  conditionally 
to  A  and  B  respectively,  then  it  is  possible  for  the  Cauchy  product, 

Y  dn  where  dn  =  a06n  +  aibn-i  H - b  an60, 

to  diverge.  However,  if  ^2  dn  does  converge,  then  it  must  converge  to  AB.  To 
prove  this,  set 

/(x)  =  ^^anxn,  g(x)  =  ^^bnxn ,  and  h(x)  =  dnxn . 

Use  Abel’s  Theorem  and  the  result  in  Exercise  2.8.7  to  establish  this  result. 

Exercise  6.5.10.  Let  g(x)  =  Y^^bnxn  converge  on  (— R,  R),  and  assume 
(xn)  0  with  xn  7^  0.  If  g(xn )  =  0  for  all  n  G  N,  show  that  g(x)  must  be 
identically  zero  on  all  of  (— R,  R). 

Exercise  6.5.11.  A  series  J22 Lo  an  is  said  to  be  Abel-summable  to  L  if  the 
power  series 

oo 

f(x)  =  y  anxn 

n= 0 

converges  for  all  x  G  [0, 1)  and  L  =  linr^!-  f(x). 

(a)  Show  that  any  series  that  converges  to  a  limit  L  is  also  Abel-summable 
to  L. 

(b)  Show  that  Xl^Lo(— ^)n  Abel-summable  and  find  the  sum. 

6.6  Taylor  Series 

Our  study  of  power  series  has  led  to  some  enthusiastic  conclusions  about  the 
nature  of  functions  of  the  form 

f(x)  =  ao  +  a\x  +  CL2X2  +  a^x 3  +  a^x4  +  •  •  •  . 

Despite  their  infinite  character,  power  series  can  be  manipulated  more  or  less  as 
though  they  are  polynomials.  On  its  interval  of  convergence,  a  power  series  is 
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continuous  and  infinitely  differentiable,  and  successive  derivatives  or  antideriva¬ 
tives  can  be  computed  by  performing  the  desired  operation  on  each  individual 
term  in  the  series — just  as  it  is  done  for  polynomials. 

In  Section  6.1  we  informally  encountered  the  powerful  idea  that  familiar  func¬ 
tions  such  as  arctan(x)  and  y/l  +  x  can  be  represented  as  power  series.  This  is  a 
game  changing  revelation.  If  a  function  can  be  represented  as  a  power  series,  and 
a  power  series  can  be  treated  like  a  polynomial,  then  vast  new  possibilities  are 
suddenly  available  for  the  kinds  of  calculations  that  can  be  undertaken.  Given 
this  state  of  affairs,  it  is  natural  to  wonder  whether  all  of  the  well-behaved — 
i.e.,  infinitely  differentiable — functions  of  calculus  might  have  representations  as 
power  series. 

In  the  examples  and  exercises  in  this  section,  we  will  assume  the  familiar 
properties  of  the  trigonometric,  inverse  trigonometric,  exponential,  and  loga¬ 
rithmic  functions.  Rigorously  defining  these  functions  is  an  important  exercise 
in  analysis.  In  fact,  one  of  the  most  common  methods  for  providing  proper  def¬ 
initions  is  through  power  series,  a  point  of  view  that  is  explored  in  Section  8.4. 
The  point  of  this  discussion,  however,  is  to  come  at  this  question  from  the  other 
direction.  Assuming  we  are  in  possession  of  an  infinitely  differentiable  function 
such  as  sin(x),  can  we  find  suitable  coefficients  an  so  that 

sin  (a;)  =  +  a\x  +  +  a^x4  +  •  •  • 

for  at  least  some  nonzero  values  of  xl 

Manipulating  Series 

In  Section  6.1  we  generated  several  new  series  representations  starting  from  the 
formula 


1 


1 


1  +  X  +  X2  +  X3  +  X4  + 


for  all 


x 


X 


<  1 


proved  in  Example  2.7.5.  At  the  time,  we  were  not  concerned  with  supply¬ 
ing  rigorous  proofs,  but  we  have  since  done  the  bulk  of  the  work  necessary  to 
confidently  assert  that  the  manipulations  in  Section  6.1  are  perfectly  valid. 

Example  6.6.1.  Theorem  6.5.7  applied  to  equation  (1)  gives 


1 

(1  —  x)2 


=  1  +  2x  +  3x2  +  4x3  +  5x4  +  •  •  •  , 


for  all 


x 


<  1 


What  about  the  series  we  generated  for  arctan(x)?  The  substitution  of  —  x2  for 
x  in  (1)  doesn’t  cause  any  problem: 


1 


r* 

=  1  —  xA  H-  x^  —  x  +  x 


1  +  r 


.8 


for  all 


x 


<  1 
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The  content  of  Exercise  6.5.4  is  that  we  can  take  the  term-by-term  antideriva¬ 
tive  of  this  series  and  arrive  at  an  antiderivative  for  1/(1  +  x2).  Noting  that 
arctan(O)  =  0,  it  follows  that 


arctan(x)  =  x 


1  3  1  5  1  T 

-x  H — x x  + 

3  5  7 


for  all  x  G  (—1,1).  In  fact,  this  formula  is  also  valid  for  x  =  ±1.  (Exercise  6.6. 1.) 
Similar  methods  can  be  used  to  find  series  representations  for  functions  such  as 
log(l  +  x)  and  x/ (1  +  x2)2 . 


Taylor’s  Formula  for  the  Coefficients 

Manipulating  old  series  to  produce  new  ones  was  a  well-honed  craft  in  the 
17th  and  18th  centuries,  but  there  also  emerged  a  formula  for  producing  the 
coefficients  from  “scratch” — a  recipe  for  generating  a  power  series  representation 
using  only  the  function  in  question  and  its  derivatives.  The  technique  is  named 
after  the  mathematician  Brook  Taylor  (1685-1731)  who  published  it  in  1715, 
although  it  was  certainly  known  previous  to  this  date. 

Given  an  infinitely  differentiable  function  /  defined  on  some  interval  centered 
at  zero,  the  idea  is  to  assume  that  /  has  a  power  series  expansion  and  deduce 
what  the  coefficients  must  be. 

Theorem  6.6.2  (Taylor’s  Formula).  Let 

(3)  f(x)  =  do  +  a\x  +  ci2X2  +  a^x3  +  a^x4  +  a$x3  +  •  •  • 

be  defined  on  some  nontrivial  interval  centered  at  zero.  Then, 

/^(o) 


Proof.  Exercise  6.6.3 


□ 


Let’s  use  Taylor’s  formula  to  produce  the  so-called  Taylor  series  for  sin(x). 
For  the  constant  term  we  get  a o  =  sin(0)  =  0.  Then,  a\  =  cos(0)  =  1,  — 

—  sin(0)/2!  =  0,  and  as  =  —  cos(0)/3!  =  —1/3!.  Continuing  on,  we  are  led  to 
the  series 


S  ^  7 

rp'-'  rp  ^  rp  1 

tAy  tAy  tAy 

X~  3!  +V  7!  + 


So  can  we  say  that  this  series  equals  sin(x)?  Well,  we  need  to  be  very  clear  about 
what  we  have  proved  to  this  point.  To  derive  Taylor’s  formula,  we  assumed  that 
f  actually  had  a  power  series  representation.  The  conclusion  is  that  if  /  can  be 
expressed  in  the  form 


oo 

f(x)  =  ^  anxn 

n= 0 


•> 


200 


Chapter  6.  Sequences  and  Series  of  Functions 


then  it  must  be  that 

_  /(n)( 0) 
n\ 

But  what  about  the  converse  question?  Assume  /  is  infinitely  differentiable 
in  a  neighborhood  of  zero.  If  we  let 


CLn  — 


/(H)(0) 


does  the  resulting  series 

oo 

n— 0 


converge  to  f(pc)  on  some  nontrivial  set  of  points?  Does  it  converge  at  all?  If 
it  does  converge,  we  know  that  the  limit  function  is  an  infinitely  differentiable 
function  whose  derivatives  at  zero  are  exactly  the  same  as  the  derivatives  of  /. 
Is  it  possible  for  this  limit  to  be  different  from  /?  In  other  words,  might  the 
Taylor  series  of  a  function  converge  to  the  wrong  thing? 

Let 

S TV (t)  =  T  CL\X  Ci2%  +  '  '  '  +  CLjyX 


The  polynomial  Sjsr(x)  is  a  partial  sum  of  the  Taylor  series  expansion  for  the 
function  f(x).  Thus,  we  are  interested  in  whether  or  not 


lim  Sn{%)  =  f(x) 

TV— >-oo 

for  some  values  of  x  besides  zero. 


Lagrange’s  Remainder  Theorem 

A  powerful  tool  for  analyzing  this  question  was  provided  by  Joseph  Louis  La¬ 
grange  (1736-1813).  The  idea  is  to  consider  the  difference 

EN(x)  =  f(x)  -  SN(x), 


which  represents  the  error  between  /  and  the  partial  sum  Sn- 

Theorem  6.6.3  (Lagrange’s  Remainder  Theorem).  Let  f  be  differentiable 
N  -f  1  times  on  (— i?,  R),  define  an  =  /(n)(0) /n\  for  n  =  0, 1, . . . ,  N ,  and  let 

SN(x)  =  uq  T  cl\_x  T  a2X  T  •  •  •  T  ajyx 


Given  x  0  in  (—R,R),  there  exists  a  point  c  satisfying  \c 
error  function  En(x)  =  f(x)  —  Sn{x)  satisfies 


< 


x 


where  the 


(JV  +  1)! 


En{x) 
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Before  embarking  on  a  proof,  let’s  examine  the  significance  of  this  result. 
Proving  Sn{%)  f{x)  is  equivalent  to  showing  En{x)  0.  There  are  three 
components  to  the  expression  for  En{x).  In  the  denominator,  we  have  {NT  1)!, 
which  helps  to  make  Ejy  small  as  N  tends  to  infinity.  In  the  numerator,  we 
have  xN+1,  which  potentially  grows  depending  on  the  size  of  x.  Thus,  we  should 
expect  that  a  Taylor  series  is  less  likely  to  converge  the  farther  x  is  chosen  from 
the  origin.  Finally,  we  have  /(iV+1)(c),  which  is  a  bit  of  a  mystery.  For  functions 
with  straightforward  derivatives,  this  term  can  often  be  handled  using  a  suitable 
upper  bound. 


Example  6.6.4.  Consider  the  Taylor  series  for  sin(x)  generated  earlier.  How 
well  does 


1  .3  1 


Ss(x)  =  x  — —x6  +  —  .X'5 

w  3!  5! 


approximate  sin(x)  on  the  interval  [—2,2]?  Lagrange’s  Remainder  Theorem 
asserts  that  the  difference  between  these  two  functions  is 


E${x)  =  sin(x)  —  S${x)  = 


-sin(c)  6 

•  T 

6! 


for  some  c  in  the  interval  ( 


x 


x 


).  Not  knowing  the  value  of  c,  we  can  still  be 


quite  certain  that  |  sin  (c)|  <  1.  Because  x  G  [—2,  2],  we  have 


E5(x)  < 


>6 


6! 


r^> 


.089. 


To  prove  that  Sjy(x)  converges  uniformly  to  sm(x)  on  [—2,2],  we  observe 
that  the  /^+1^(c)  term  in  the  Lagrange  formula  will  never  exceed  1  in  absolute 
value.  Thus, 


En(x) 


/(7V+1)(c) 

{NT  1)! 


xN +1 


< 


1 


i-ZV+l 


{NT  1)! 


for  x  G  [—2,  2].  Because  factorials  grow  significantly  faster  than  exponentials,  it 
follows  that  En{x)  0  uniformly  on  [—2,  2]. 

Replacing  the  constant  2  with  an  arbitrary  constant  R  has  no  effect  on  the 
validity  of  the  argument,  and  so  the  Taylor  series  converges  uniformly  to  sin(x) 
on  every  interval  of  the  form  [— R,  R]. 


Proof  of  Lagrange  \ s  Remainder  Theorem:  The  Taylor  coefficients  are  chosen 
so  that  the  function  /  and  the  polynomial  Sn  have  the  same  derivatives  at 
zero,  at  least  up  through  the  TVth  derivative,  after  which  Sn  becomes  the  zero 
function.  In  other  words,  f<yTl\ 0)  =  S^\ 0)  for  all  0  <  n  <  TV,  which  implies 
the  error  function  En{x)  =  f{x)  —  Sn{x)  satisfies 


^n^ (0)  =0  for  all  n  =  0, 1,  2, . . . ,  TV. 


The  key  ingredient  in  this  argument  is  the  Generalized  Mean  Value  Theorem 
(Theorem  5.3.5)  from  Chapter  5.  To  simplify  notation,  let’s  assume  x  >  0  and 
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apply  the  Generalized  Mean  Value  Theorem  to  the  functions  Ejy(x)  and  xN+1 
on  the  interval  [0,x].  Thus,  there  exists  a  point  x\  E  (0,x)  such  that 


Fn[x )  _  E'n[x i) 
xN +1  _  (N  +  l)xf 

Now  apply  the  Generalized  Mean  Value  Theorem  to  the  functions  E'N(x)  and 
(. N  +  l)xN  on  the  interval  [0,xi]  to  get  that  there  exists  a  point  x 2  E  (0,aq) 
where 


En(x)  _  E'n(x  1)  _  E'm(x 2) 

xN+1  ~  {N  +  l)xf  _  (N  +  l)^-1 ' 

Continuing  in  this  manner  we  find 

En(x)  _  E^+1\xn+  1) 
aJV+i  (N  +  l)\ 

where  xjv+i  £  (0,  xn)  Q  •  •  •  C  (0,  cc).  Now  set  c  =  Xjv+i-  Because  sff+1\ x)  = 
0,  we  have  E^+1\x)  =  f^N+1\x)  and  it  follows  that 


En(x) 


/(7V+1)(c) 

(N  +  1)! 


as  desired. 


□ 


Taylor  Series  Centered  at  a  /  0. 

Throughout  this  chapter  we  have  focused  our  attention  on  series  expansions 
centered  at  zero,  but  there  is  nothing  special  about  zero  other  than  notational 
simplicity.  If  /  is  defined  in  some  neighborhood  of  a  E  R  and  infinitely  differ¬ 
entiable  at  a,  then  the  Taylor  series  expansion  around  a  takes  the  form 


oo 

^  cn(x  —  a)n  where  cn 

n= 0 


Setting  En(x)  =  f(x)  —  Sn{%)  as  usual,  Lagrange’s  Remainder  Theorem  in  this 
case  says  that  there  exists  a  value  c  between  a  and  x  where 


En{%) 


(N  +  1)!  1 


a)N+1 . 


In  Exercise  6.6.9,  we  derive  an  alternate  remainder  formula  due  to  Cauchy  that 
requires  these  more  general  expansions  for  its  derivation. 
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A  Counterexample 


Lagrange’s  Remainder  Theorem  is  extremely  useful  for  determining  how  well  the 
partial  sums  of  the  Taylor  series  approximate  the  original  function,  but  it  leaves 
unresolved  the  central  question  of  whether  or  not  the  Taylor  series  necessarily 
converges  to  the  function  that  generated  it.  The  appearance  of  f(N+1'(c)  in  the 
error  formula  makes  any  general  statement  impossible.  The  Cauchy  form  of  the 
remainder  just  mentioned  provides  another  way  to  represent  the  error  between 
the  partial  sum  Sn(x)  and  the  function  /(#),  and  there  are  others  still,  but 
none  lend  themselves  to  a  proof  that  Sn  — >  /•  This  is  because  no  such  proof 
exists!  Let 


e  X!x  for  r /0, 

0  for  x  =  0. 


Computing  the  Taylor  coefficients  for  this  function,  it’s  clear  that  ao  =  g( 0)  =  0. 
To  compute  a\  we  write 


a  i  =  g'(  0)  =  lim 

x— ^0 


g(x)  -  g( o) 


X 


0 


=  lim 

x— ^0 


1/x' 


X 


l/x 

llmn 

x— e1/ x 


where  both  numerator  and  denominator  tend  to  oo  as  x  approaches  zero.  App¬ 
lying  the  oo/oo  version  of  L’Hospital’s  Rule  (Theorem  5.3.8)  we  see 


a\ 


lim  --TT 

®->o  e1/*  (— 2/a;3) 


lim 


x 


X^r 


o  2ex/ 


X‘ 


This  tells  us  that  g  is  fiat  at  the  origin.  In  Exercise  6.6.6,  we  outline  the  rest  of 
the  proof  showing  that  g^n\0)  =  0  for  all  n  E  N;  in  other  words,  g  is  extremely 
flat  at  the  origin. 

The  implications  of  this  example  are  highly  significant.  The  function  g  is 
infinitely  differentiable,  and  every  one  of  its  Taylor  coefficients  is  equal  to  zero. 
By  default,  then,  its  Taylor  series  converges  uniformly  on  all  of  R  to  the  zero 
function.  But  other  than  at  x  =  0,  g(pc)  is  never  equal  to  zero.  The  Taylor  series 
for  g(pc)  converges ,  but  it  does  not  converge  to  g(x)  except  at  the  center  point 
x  =  0.  The  unmistakable  conclusion  is  that  not  every  infinitely  differentiable 
function  can  be  represented  by  its  Taylor  series. 


Exercises 

Exercise  6.6.1.  The  derivation  in  Example  6.6.1  shows  the  Taylor  series  for 
arctan(x)  is  valid  for  all  x  E  (—1,1).  Notice,  however,  that  the  series  also 
converges  when  x  =  1.  Assuming  that  arctan(x)  is  continuous,  explain  why  the 
value  of  the  series  at  x  =  1  must  necessarily  be  arctan(l).  What  interesting 
identity  do  we  get  in  this  case? 

Exercise  6.6.2.  Starting  from  one  of  the  previously  generated  series  in  this 
section,  use  manipulations  similar  to  those  in  Example  6.6.1  to  find  Taylor 
series  representations  for  each  of  the  following  functions.  For  precisely  what 
values  of  x  is  each  series  representation  valid? 
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(a)  xcos(x2) 

(b)  x/ (1  +  4x2)2 

(c)  log(l  +  x2) 

Exercise  6.6.3.  Derive  the  formula  for  the  Taylor  coefficients  given  in 
Theorem  6.6.2. 


Exercise  6.6.4.  Explain  how  Lagrange’s  Remainder  Theorem  can  be  modified 
to  prove 


1 


1  1 
—  +  — 

4  5 


=  log(2). 


Exercise  6.6.5.  (a)  Generate  the  Taylor  coefficients  for  the  exponential  func¬ 

tion  f(x)  =  ex ,  and  then  prove  that  the  corresponding  Taylor  series  con¬ 
verges  uniformly  to  ex  on  any  interval  of  the  form  [— R,  R\. 


(b)  Verify  the  formula  f'(x)  =  ex . 


(c)  Use  a  substitution  to  generate  the  series  for  e~x ,  and  then  informally 
calculate  ex  •  e~x  by  multiplying  together  the  two  series  and  collecting 
common  powers  of  x. 


Exercise  6.6.6.  Review  the  proof  that  g'( 0)  =  0  for  the  function 


e  X!x  for  x  7^  0, 

0  for  x  =  0. 


introduced  at  the  end  of  this  section. 


(a)  Compute  g'(x)  for  x  ^  0.  Then  use  the  definition  of  the  derivative  to  find 

$"(  0). 


(b)  Compute  g"(x)  and  g"'(x)  for  x  ^  0.  Use  these  observations  and  in¬ 
vent  whatever  notation  is  needed  to  give  a  general  description  for  the  nth 
derivative  g ^  (x)  at  points  different  from  zero. 

(c)  Construct  a  general  argument  for  why  g^n\0)  =  0  for  all  n  E  N. 


Exercise  6.6.7.  Find  an  example  of  each  of  the  following  or  explain  why  no 
such  function  exists. 


(a)  An  infinitely  differentiable  function  g(x)  on  all  of  R  with  a  Taylor  series 
that  converges  to  g(x)  only  for  x  E  (—1,1). 

(b)  An  infinitely  differentiable  function  h(x)  with  the  same  Taylor  series  as 
sin(x)  but  such  that  h(x)  ^  sin(x)  for  all  x  ^  0. 

(c)  An  infinitely  differentiable  function  f(x)  on  all  of  R  with  a  Taylor  series 
that  converges  to  f(x)  if  and  only  if  x  <  0. 
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Exercise  6.6.8.  Here  is  a  weaker  form  of  Lagrange’s  Remainder  Theorem  whose 
proof  is  arguably  more  illuminating  than  the  one  for  the  stronger  result. 


(a)  First  establish  a  lemma:  If  g  and  h  are  differentiable  on  [0,x]  with  g( 0) 
h( 0)  and  g'(t)  <  h'(t )  for  all  t  E  [0,x],  then  g(t)  <  h(t )  for  all  t  E  [0,#]. 


(b)  Let  /,  Sn ,  and  E/v  be  as  Theorem  6.6.3,  and  take  0  < 
/(iV+1)(t)|  <  M  for  all  t  E  [0,x],  show 


x  <  R. 


En(x)\  < 


Mxn+1 

(TV  +  1)!  * 


If 


Exercise  6.6.9  (Cauchy’s  Remainder  Theorem).  Let  /  be  differentiable 

TV  +  1  times  on  (— R,  R).  For  each  a  E  (— R,  R),  let  S/v(u  a)  be  the  partial  sum 
of  the  Taylor  series  for  /  centered  at  a ;  in  other  words,  define 


AT 


Sn(x,  a)  =  cn(x  —  a)n  where 


— 


/("}(a) 


n=0 


n! 


Let  En(x,  a)  =  f(x)  —  SN(x,  a).  Now  fix  x  7^  0  in  (— R,  R)  and  consider  En(x,  a) 
as  a  function  of  a. 


(a)  Find  Ejy( 


(b)  Explain  why  Ejy(x,a)  is  differentiable  with  respect  to  a,  and  show 


E'n(x,  a) 


~/(iV+1)(a) 

TV! 


(c)  Show 


Ejsf  {x)  =  En(x,  0) 


f{N+1)(c) 

TV! 


N 


X 


for  some  c  between  0  and  x.  This  is  Cauchy’s  form  of  the  remainder  for 
Taylor  series  centered  at  the  origin. 


Exercise  6.6.10.  Consider  f(x)  =  1/y/l  —  x. 


(a)  Generate  the  Taylor  series  for  /  centered  at  zero,  and  use  Lagrange’s 
Remainder  Theorem  to  show  the  series  converges  to  /  on  [0, 1/2].  (The 
case  x  <  1/2  is  more  straightforward  while  x  =  1/2  requires  some  extra 
care.)  What  happens  when  we  attempt  this  with  x  >  1/2? 


(b)  Use  Cauchy’s  Remainder  Theorem  proved  in  Exercise  6.6.9  to  show  the 
series  representation  for  /  holds  on  [0, 1). 


6.7  The  Weierstrass  Approximation  Theorem 

Karl  Weierstrass’s  name  is  attached  to  a  number  of  significant  results  discussed 
already.  The  Bolzano- Weierstrass  Theorem  was  fundamental  to  understanding 
the  relationship  between  convergence,  completeness,  and  compactness  worked 
out  in  the  early  chapters.  In  this  chapter,  the  Weierstrass  M-Test  emerged 
as  the  primary  tool  for  demonstrating  uniform  convergence  of  infinite  series. 
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As  discussed  in  Section  5.4,  Weierstrass  was  also  responsible  for  one  of  the 
earliest  examples  of  a  continuous,  nowhere  differentiable  function,  making  this 
discovery  in  1872. 

In  1885,  Weierstrass  proved  a  result  that  served  as  an  interesting  counter¬ 
point  to  his  nowhere  differentiable  function.  This  theorem,  which  also  bears  his 
name,  would  become  the  catalyst  for  a  new  branch  of  analysis  called  approxi¬ 
mation  theory. 

Theorem  6.7.1  (Weierstrass  Approximation  Theorem).  Let  f  :  [a,  b]  -V 

R  be  continuous.  Given  e  >  0,  there  exists  a  polynomial  p(x)  satisfying 

\f(x)  -p(x)  I  <  e 


for  all  x  G  [a,  b] 


A  restatement  of  the  Weierstrass  Approximation  Theorem  (WAT)  without 
all  the  symbols  is  that  every  continuous  function  on  a  closed  interval  can  be 
uniformly  approximated  by  a  polynomial. 


Exercise  6.7.1.  Assuming  WAT,  show  that  if  /  is  continuous  on  [a,  6],  then 
there  exists  a  sequence  (pn)  of  polynomials  such  that  pn  f  uniformly  on  [a,  b\. 

Our  work  in  the  previous  section  provides  a  nice  starting  point  for  under¬ 
standing  what  WAT  is  saying.  Given  a  function  such  as  sin(x),  we  saw  in 
Example  6.6.4  that  the  resulting  Taylor  series  converges  uniformly  on  compact 
sets  back  to  sin(x).  Because  the  partial  sums  of  a  Taylor  series  are  polynomials, 
this  example  constitutes  a  proof  of  WAT  in  the  very  special  case  of  f(pc)  =  sin(x). 
It  should  be  clear,  however,  that  Taylor  series  won’t  work  in  general.  To  con¬ 
struct  a  Taylor  series,  we  need  /  to  be  an  infinitely  differentiable  function  (and 
even  then  the  Taylor  series  might  fail  to  approximate  /),  while  WAT  requires 
only  that  /  be  continuous. 

So  should  we  be  surprised  that  such  a  theorem  is  true?  This  is  hard  to  say. 
On  a  purely  intuitive  level,  if  we  consider  a  smooth  curve  like  f(pc)  =  y/1  —  x  on 
[— 1, 1],  then  it  doesn’t  take  too  much  imagination  to  believe  that  a  polynomial 
might  exist  that  tracks  closely  with  a/1  —  x  as  x  moves  over  the  domain.  But 
one  of  the  lessons  of  Section  5.4  is  that  a  continuous  function  does  not  have  to 
be  smooth.  Although  it  is  not  Weierstrass’s  original  example,  a  careful  look  at 
the  nowhere  differentiable  function  shown  in  Figure  5.7  makes  the  point  just  as 
well.  Despite  the  unimaginably  jagged  nature  of  the  graph,  according  to  WAT, 
it  is  still  possible  to  find  a  polynomial  that  uniformly  approximates  this  unruly 
function  to  any  prescribed  degree  of  accuracy. 


Interpolation 

Weierstrass’s  theorem  deals  with  approximating  polynomials,  but  a  good  way  to 
get  a  feel  for  the  content  of  this  result  is  to  temporarily  replace  the  polynomials 
in  WAT  with  the  collection  of  all  continuous,  piecewise-linear  functions. 
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Figure  6.6:  POLYGONAL  APPROXIMATION  OF  f(x)  =  a/I  -  X. 


Definition  6.7.2, 

a  partition 


A  continuous  function  <f  :  [a,  b]  -A  R  is  polygonal  if  there  is 


<xn  =  b 


a  =  xq  <  x\  <  X2  <  •  •  • 

of  [a,  b }  such  that  <f  is  linear  on  each  subinterval 

The  term  “interpolation”  refers  to  the  process  of  finding  a  function  whose 
graph  passes  through  a  given  set  of  points.  If,  for  example,  we  take  the  points 


where  i  =  1, . . .  n. 


(°.l),  1 1,  ^ 


|»|  1  ,(1,0) 


then  there  is  an  obvious  polygonal  function  that  interpolates  these  points:  it 
is  just  the  function  we  get  by  connecting  the  points  with  line  segments.  Now 
these  four  points  all  he  on  the  graph  of  f(x)  =  a/1  —  x,  and  notice  that  the 
resulting  polygonal  interpolation  does  a  reasonable  job  of  imitating  the  graph 
of  /  (Fig.  6.6).  This  is  not  an  accident. 

Theorem  6.7.3.  Let  f  :  [a,  b\  -a  R  be  continuous.  Given  e  >  0,  there  exists  a 
polygonal  function  <f>  satisfying 


f{x)  -  (j)(x)  I  <  e 


for  all  x  G  [a,  b\. 


Exercise  6.7.2.  Prove  Theorem  6.7.3. 


Notice  how  similar  Theorem  6.7.3  is  to  WAT,  the  only  difference  being  that 
we  have  substituted  a  polygonal  function  in  place  of  the  polynomial. 

The  strategy  for  the  proof  of  Theorem  6.7.3  is  to  first  choose  an  appropriate 
numbers  of  points  on  the  graph  of  /,  and  then  show  that  the  resulting  polygonal 
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interpolation  of  these  points  does  the  trick.  It’s  not  unreasonable  to  suspect 
that  a  similar  strategy  might  lead  to  a  proof  of  the  Weierstrass  Approximation 
Theorem.  Can  we  prove  WAT  by  constructing  a  polynomial  interpolation  of 
points  on  the  graph  of  /?  Well,  no  as  it  turns  out,  but  this  is  not  so  easy  to  see. 

Exercise  6.7.3.  (a)  Find  the  second  degree  polynomial  p(x)  =  qo -\-g1X-hq2x2 

that  interpolates  the  three  points  (—1, 1),  (0,  0),  and  (1,1)  on  the  graph  of 
g{pc)  =  \x\.  Sketch  g{x)  and  p(x)  over  [—1,1]  on  the  same  set  of  axes. 

(b)  Find  the  fourth  degree  polynomial  that  interpolates  g(x)  =  \x\  at  the 
points  x  =  —1,— 1/2,  0,1/2,  and  1.  Add  a  sketch  of  this  polynomial  to 
the  graph  from  (a). 

The  previous  exercise  may  still  give  the  impression  that  a  polynomial  inter¬ 
polation  approach  is  going  to  lead  to  a  proof  of  WAT,  but  that  isn’t  the  case. 
Continuing  on  with  larger  and  larger  numbers  of  equally  spaced  points  yields 
high  degree  polynomials  that  oscillate  very  rapidly  and  actually  do  a  poor  job  of 
approximating  g  between  the  interpolating  points.  In  fact,  it  turns  out  that  the 
resulting  sequence  of  polynomials  only  converges  to  g(x)  when  x  =  — 1,0,  or  1. 

Approximating  the  Absolute  Value  Function 

Having  reached  a  temporary  dead  end,  we  need  to  back  up  a  bit  and  take  a 
different  turn.  Let’s  return  to  Theorem  6.7.3  which  asserts  that  every  continuous 
function  can  be  uniformly  approximated  by  a  polygonal  function.  This  should 
feel  like  a  promising  first  step  toward  a  proof  of  WAT  and  indeed  it  is.  If  we  can 
find  a  way  to  approximate  an  arbitrary  polygonal  function  with  polynomials, 
then  a  triangle  inequality  argument  would  finish  the  proof. 

Before  we  get  too  excited  about  this  line  of  attack,  keep  in  mind  that  the 
absolute  value  function  from  Exercise  6.7.3  is  an  example  of  a  polygonal  function 
and  we  are  currently  unsure  how  to  produce  polynomials  to  approximate  it. 
What  has  changed,  however,  is  our  motivation  for  doing  so.  A  moment’s  thought 
reveals  that  handling  the  absolute  value  function  might  be  the  key  to  solving 
the  whole  problem.  Why  is  this?  Every  polygonal  function  is  made  up  of 
line  segments  that  meet  at  corners.  If  we  can  find  polynomials  that  uniformly 
approximate  g(x)  =  \x\  with  its  right  angled  corner  at  the  origin,  then  with  a 
little  cleverness  we  ought  to  be  able  to  handle  more  general  polygonal  functions 
and  prove  WAT  using  Theorem  6.7.3. 


Cauchy’s  Remainder  Formula  for  Taylor  Series 


One  elegant  way  to  show  g(x)  =  \x\  is  the  uniform  limit  of  polynomials  is  via 
Taylor  series,  which  is  a  bit  surprising  given  that  \x\  is  not  differentiable.  The 
trick,  as  we  will  see,  is  to  start  by  computing  the  Taylor  series  for  the  infinitely 
differentiable  function  a/1  — 
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Exercise  6.7.4.  Show  that  f(x)  =  a/I  —  x  has  Taylor  series  coefficients  an 
where  do  =  1  and 

—  1  •  3  •  5  •  •  •  (2n  —  3) 

ar)  = - 

2 • 4  •  6  •  •  • 2n 

for  n  >  1. 

Our  goal  is  to  show 


(i) 

for  all  x  G 


oo 

Vl  —  £  =  anxn 

n= 0 

—  1,1]  by  showing  that  the  error  function 

N 

En(x)  =  \J  1  —  x  —  anxn 

n— 0 


tends  to  0  as  TV  — oo.  To  this  point,  Lagrange’s  Remainder  Theorem  has  been 
the  featured  tool  for  jobs  like  this,  but  it  comes  up  short  in  this  case.  To  see 
exactly  why,  fix  x  G  (0,1].  Then  Theorem  6.6.3  asserts  that  there  exists  a 
c  G  (0,  x)  (dependent  on  N)  such  that 


En{x) 


/(Af+1)(c) 

(N  +  1)! 

1 

(N  +  1)! 


xN+1 


— 1  •  3  •  5  •  •  •  (2iV  —  1) 
2'W+1(1  —  c)N+1/2 


(  —  1  •  3  •  5  •  •  •  (2A^  —  1)\  /  X 
V  2-4-6---(2iV  +  2)  )  \1  -  c 


xN+1 

N+l/2 

rl/2 


The  problem  is  that  x/(l  —  c)  is  largest  when  c  =  x,  and  (x/ (1  —  x))Ar+1/2 
goes  exponentially  to  infinity  when  x  is  bigger  than  1/2.  This  doesn’t  mean 
our  Taylor  series  is  only  valid  on  [0, 1/2];  it  just  means  we  are  using  the  wrong 
remainder  formula. 


Exercise  6.7.5.  (a)  Follow  the  advice  in  Exercise  6.6.9  to  prove  the  Cauchy 

form  of  the  remainder: 

En(x)  =  EAM{x-c)nx 


for  some  c  between  0  and  x. 


(b)  Use  this  result  to  prove  equation  (1)  is  valid  for  all  x  G  (—1,1). 

Although  Cauchy’s  Remainder  Theorem  doesn’t  tell  us  so,  equation  (1)  is 
also  valid  at  x  =  ±1. 
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Exercise  6.7.6.  (a)  Let 

1  •  3  •  5  •  •  •  (2n  —  1) 

cn  = - 

2 • 4  •  6  •  • • 2n 

for  n  >  1.  Show  cn  <  V22n+1 ■ 

(b)  Use  (a)  to  show  that  an  converges  (absolutely,  in  fact)  where  an  is 

the  sequence  of  Taylor  coefficients  generated  in  Exercise  6.7.4. 

(c)  Carefully  explain  how  this  verifies  that  equation  (1)  holds  for  all  x  G 

[-M]. 

Recall  that  our  goal  is  to  find  polynomials  that  uniformly  approximate  the 
absolute  value  function  on  an  interval  containing  the  non-differentiable  point  at 
the  origin.  Our  Taylor  series  for  y/l  —  x  provides  a  clever  shortcut  for  handling 
this  task. 

Exercise  6.7.7.  (a)  Use  the  fact  that  \a\  =  \foF  to  prove  that,  given  e  >  0, 

there  exists  a  polynomial  q(x)  satisfying 


q{x)  |  <  e 


for  all  x  G  [—1,1]. 

(b)  Generalize  this  conclusion  to  an  arbitrary  interval  [a,  b] 


Proving  WAT 

Earlier  we  suggested  that  proving  WAT  for  the  special  case  of  the  absolute  value 
function  was  the  key  to  the  whole  proof.  Now  it  is  time  to  fill  in  the  details. 

Exercise  6.7.8.  (a)  Fix  a  G  [—1,1]  and  sketch 

1 

ha{x)  =  ~(\x  —  a\  +  (x  —  a)) 


over  [—1,1].  Note  that  ha  is  polygonal  and  satisfies  ha(x)  =  0  for  all 
x  G  [—1,  a]. 

(b)  Explain  why  we  know  ha(x)  can  be  uniformly  approximated  with  a  poly¬ 
nomial  on  [—1,1]. 

(c)  Let  0  be  a  polygonal  function  that  is  linear  on  each  subinterval  of  the 
partition 

—  1  =  <2o  <  cli  <  <22  <  •  •  •  <  an  =  1  . 

Show  there  exist  constants  60,  foi , . . . ,  6n_i  so  that 

4>{x)  =  <f>{-\)  +  b0hao{x)  +  b\hai  (x)  H - b  bn-ihan_1{x) 


for  all  x  G  [—1,1]. 
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(d)  Complete  the  proof  of  WAT  for  the  interval  [—1,1],  and  then  generalize 
to  an  arbitrary  interval  [a,  b]. 

Exercise  6.7.9.  (a)  Find  a  counterexample  which  shows  that  WAT  is  not 

true  if  we  replace  the  closed  interval  [a,  b]  with  the  open  interval  (a,  b). 

(b)  What  happens  if  we  replace  [a,  b]  with  the  closed  set  [a,  oo).  Does  the 
theorem  still  hold? 

Exercise  6.7.10.  Is  there  a  countable  subset  of  polynomials  C  with  the  prop¬ 
erty  that  every  continuous  function  on  [a,  b]  can  be  uniformly  approximated  by 
polynomials  from  C? 

Exercise  6.7.11.  Assume  that  /  has  a  continuous  derivative  on  [a,  b\.  Show 
that  there  exists  a  polynomial  p(x)  such  that 


\f(x)  —  p(x)\  <  e  and  |  f(x)  —  pf (x)\  <  e 


for  all  x  G  [a,  b] 


6.8  Epilogue 


The  argument  sketched  out  here  for  the  Weierstrass  Approximation  Theorem 
is  due  to  Henri  Lebesque,  who  published  his  proof  in  1898.  Its  greatest  virtue 
is  its  relative  simplicity.  Starting  from  a  single  special  case — the  absolute  value 
function — we  managed  to  bootstrap  our  way  up  to  an  arbitrary  continuous 
function.  A  downside  of  this  approach  is  that  by  the  time  we  reach  the  case  of 
a  general  continuous  function,  there  is  no  practical  way  to  explicitly  write  down 
a  formula  for  the  polynomial  that  approximates  it. 

There  are  a  number  of  other  proofs  for  WAT  that  don’t  have  this  drawback. 
A  particularly  popular  one  was  provided  by  Sergei  Bernstein.  Bernstein  employs 
a  family  of  polynomials — now  called  Bernstein  polynomials — that  have  become 
important  in  their  own  right.  Weierstrass’s  original  approach  was  also  quite 
elegant.  His  proof  has  much  in  common  with  the  proof  of  Fejer’s  Theorem  in 
Section  8.5  on  Fourier  series.  Not  coincidentally,  it  is  possible  to  derive  yet 
another  proof  of  WAT  as  a  corollary  to  Fejer’s  Theorem.  (See  Exercise  8.5.11.) 

The  Weierstrass  Approximation  Theorem  is  set  on  a  closed  interval  [a,  b\. 
Exercise  6.7.9  is  included  to  emphasize  the  importance  of  the  closed  and  bounded 
nature  of  the  domain,  but  it  should  not  be  too  surprising  that  the  theorem  will 
remain  true  if  we  replace  [a,  b]  with  an  arbitrary  compact  set.  What  about 
replacing  the  set  of  polynomials?  Are  there  other  collections  of  relatively  simple 
continuous  functions  that  can  be  used  to  approximate  an  arbitrary  continuous 
function?  Sure  there  are.  In  Theorem  6.7.3  we  saw  that  polygonal  functions  have 
this  property,  and  there  are  other  examples  as  well.  In  the  late  1930s,  Marshall 
Stone  proved  a  far-reaching  generalization  of  the  Weierstrass  Approximation 
Theorem.  Stone’s  version  of  WAT  starts  with  an  arbitrary  compact  set  K  and 
a  collection  C  of  continuous  functions  on  K  with  the  following  three  properties: 
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(i)  the  constant  function  k(x)  =  1  is  in  C, 

(ii)  if  p,  q  E  C  and  cGR  then  p  +  q,pq,  and  cp  are  all  in  C, 

(iii)  if  x  7^  y  in  K ,  then  there  exists  p  E  C  with  p(x)  7^  p(p). 


Under  these  conditions,  Stone  showed  that  any  continuous  function  on  K  could 
be  uniformly  approximated  by  functions  in  C.  This  result,  referred  to  as  the 
S tone- Weierstr ass  Theorem,  has  a  slightly  more  involved  proof  that  tracks  very 
closely  with  Lebesgue’s  proof  of  WAT  outlined  in  the  previous  section.  In  par¬ 
ticular,  both  arguments  depend  fundamentally  on  being  able  to  approximate 
the  absolute  value  function  with  polynomials. 

A  collection  of  functions  that  possesses  property  (ii)  of  the  Stone- Weierstrass 
Theorem  is  called  an  algebra.  An  algebra  that  possesses  property  (iii)  is  said  to 
separate  points.  Having  the  constant  function  k(x)  =  1  in  the  algebra  ensures 
we  don’t  have  some  xo  E  K  where  p(x 0)  =  0  for  all  functions  in  our  algebra. 
(Why  would  this  be  problematic?)  It  is  straightforward  to  check  that  the  set  of 
polynomials  as  well  as  the  set  of  polygonal  functions  form  algebras  that  separate 
points,  and  so  both  WAT  and  Theorem  6.7.3  become  special  cases  of  Stone’s 
general  result.  For  a  new  example,  consider  the  collection  of  polynomials  with 
only  even  powers  on  the  interval  [0,1].  The  Stone- Weierstrass  Theorem  tells 
us  that  this  subset  of  polynomials  can  still  uniformly  approximate  an  arbitrary 
continuous  function,  although  if  we  were  to  switch  our  domain  to  [—1, 1]  then 
this  algebra  would  no  longer  separate  points.  As  a  final  example,  consider  the 
set 

C  =  {ao  +  a\  cos(x)  +  •  •  •  +  an  cos (nx)  :  ao,  ai, . . . ,  an  E  R}. 


In  Section  8.5  we  take  up  the  theory  of  Fourier  series  which  explores  when  a 
function  has  a  representation  as  an  infinite  series  of  trigonometric  functions.  As 
a  precursor  to  that  conversation,  notice  that  the  Stone- Weierstrass  Theorem 


tells  us  at  the  outset  that  at  least  every  continuous  function  on  [0,7 r 
uniform  limit  of  functions  from  C. 


is  the 


The  story  from  Section  6.6  surrounding  Taylor  series  expansions  also  deserves 
a  final  word.  The  ingenuity  with  which  Euler  and  others  found  and  exploited 
power  series  representations  for  the  cast  of  familiar  functions  from  calculus  und¬ 
erstandably  led  to  speculation  that  every  function  could  be  represented  in  such 
a  fashion.  (The  term  “function”  at  this  time  implicitly  referred  to  functions  that 
were  infinitely  differentiable.)  This  point  of  view  effectively  ended  with  Cauchy’s 
discovery  in  1821  of  the  counterexample  presented  at  the  end  of  Section  6.6. 
So  under  what  conditions  does  the  Taylor  series  necessarily  converge  to  the 
generating  function?  Lagrange’s  Remainder  Theorem  states  that  the  difference 
between  the  Taylor  polynomial  SV(x)  and  the  function  f(x)  is  given  by 


f^+Vjc) 

(N  +  1)! 


En(x) 
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The  ( N  +  1)!  term  in  the  denominator  grows  more  rapidly  than  the  xN+1  term 
in  the  numerator.  Thus,  if  we  knew  for  instance  that 

f{N+1)(c)  I  <  M 

for  all  c  G  ( — R,  R)  and  N  G  IN",  we  could  be  sure  that  -R/v(t)  — y  0  and  hence 
that  Sn(x)  —>  f(x).  This  is  the  case  for  sin(x),  cos(x),  and  ex ,  whose  derivatives 
do  not  grow  at  all  as  N  — >  oo.  It  is  also  possible  to  formulate  weaker  conditions 
on  the  rate  of  growth  of  /(iV+1)  that  guarantee  convergence. 

It  is  not  altogether  clear  whether  Cauchy’s  counterexample  should  come  as 
a  surprise.  The  fact  that  every  previous  search  for  a  Taylor  series  ended  in 
success  certainly  gives  the  impression  that  a  power  series  representation  is  an 
intrinsic  property  of  infinitely  differentiable  functions.  But  notice  what  we  are 
saying  here.  A  Taylor  series  for  a  function  /  is  constructed  from  the  values 
of  /  and  its  derivatives  at  the  origin.  If  the  Taylor  series  converges  to  /  on 
some  interval  (— R,  R),  then  the  behavior  of  /  near  zero  completely  determines 
its  behavior  at  every  point  in  (— R,  R).  One  implication  of  this  would  be  that 
if  two  functions  with  Taylor  series  agree  on  some  small  neighborhood  (— e,e), 
then  these  two  functions  would  have  to  be  the  same  everywhere.  When  it  is 
put  this  way,  we  probably  should  not  expect  a  Taylor  series  to  always  converge 
back  to  the  function  from  which  it  was  derived.  As  we  have  seen,  this  is  not 
the  case  for  real-valued  functions.  What  is  fascinating,  however,  is  that  results 
of  this  nature  do  hold  for  functions  of  a  complex  variable.  The  definition  of  the 
derivative  looks  symbolically  the  same  when  the  real  numbers  are  replaced  by 
complex  numbers,  but  the  implications  are  profoundly  different.  In  this  setting, 
a  function  that  is  differentiable  at  every  point  in  some  open  disc  must  necessarily 
be  infinitely  differentiable  on  this  set.  This  supplies  the  ingredients  to  construct 
the  Taylor  series  that  in  every  instance  converges  uniformly  on  compact  sets  to 
the  function  that  generated  it. 
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The  Riemann  Integral 

7.1  Discussion:  How  Should  Integration 
be  Defined? 

The  Fundamental  Theorem  of  Calculus  is  a  statement  about  the  inverse  relation¬ 
ship  between  differentiation  and  integration.  It  comes  in  two  parts,  depending 
on  whether  we  are  differentiating  an  integral  or  integrating  a  derivative.  Under 
suitable  hypotheses  on  the  functions  /  and  F,  the  Fundamental  Theorem  of 
Calculus  states  that 

(i)  f  F'(x )  dx  =  F(b)  —  F(a)  and 

J  a 

nX 

(ii)  if  G(x)  =  /  f(t)dt,  then  G'(x)  =  f(x). 

J  a 

Before  we  can  undertake  any  type  of  rigorous  investigation  of  these  statements, 
we  need  to  settle  on  a  definition  for  /.  Historically,  the  concept  of  integration 
was  defined  as  the  inverse  process  of  differentiation.  In  other  words,  the  integral 
of  a  function  /  was  understood  to  be  a  function  F  that  satisfied  F'  =  /.  Newton, 
Leibniz,  Fermat,  and  the  other  founders  of  calculus  then  went  on  to  explore  the 
relationship  between  antiderivatives  and  the  problem  of  computing  areas.  This 
approach  is  ultimately  unsatisfying  from  the  point  of  view  of  analysis  because  it 
results  in  a  very  limited  number  of  functions  that  can  be  integrated.  Recall  that 
every  derivative  satisfies  the  intermediate  value  property  (Darboux’s  Theorem, 
Theorem  5.2.7).  This  means  that  any  function  with  a  jump  discontinuity  cannot 
be  a  derivative.  If  we  want  to  define  integration  via  antidifferentiation,  then  we 
must  accept  the  consequence  that  a  function  as  simple  as 
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Figure  7.1:  A  Riemann  Sum. 


(  1  for  0  <  x  <  1 
y  2  for  1  <  x  <  2 


is  not  integrable  on  the  interval  [0,  2]. 

A  very  interesting  shift  in  emphasis  occurred  around  1850  in  the  work  of 
Cauchy,  and  soon  after  in  the  work  of  Bernhard  Riemann.  The  idea  was  to 
completely  divorce  integration  from  the  derivative  and  instead  use  the  notion 
of  “area  under  the  curve”  as  a  starting  point  for  building  a  rigorous  definition 
of  the  integral.  The  reasons  for  this  were  complicated.  As  we  have  mentioned 
earlier  (Section  1.2),  the  concept  of  function  was  undergoing  a  transformation. 
The  traditional  understanding  of  a  function  as  a  holistic  formula  such  as  f(x)  = 
x 2  was  being  replaced  with  a  more  liberal  interpretation,  which  included  such 
bizarre  constructions  as  Dirichlet’s  function  discussed  in  Section  4.1.  Serving  as 
a  catalyst  to  this  evolution  was  the  budding  theory  of  Fourier  series  (discussed 
in  Section  8.5),  which  required,  among  other  things,  the  need  to  be  able  to 
integrate  these  more  unruly  objects. 

The  Riemann  integral,  as  it  is  called  today,  is  the  one  usually  discussed  in 
introductory  calculus.  Starting  with  a  function  /  on  [a,  6],  we  partition  the 
domain  into  small  subintervals.  On  each  subinterval  [xk-uXk\,  we  pick  some 
point  Ck  G  [xk-i,Xk\  and  use  the  y- value  f(ck)  as  an  approximation  for  /  on 
[xk-i,Xk\-  Graphically  speaking,  the  result  is  a  row  of  thin  rectangles  con¬ 
structed  to  approximate  the  area  between  /  and  the  x-axis.  The  area  of  each 
rectangle  is  f(ck)(xk  —  Xk-i),  and  so  the  total  area  of  all  of  the  rectangles  is 
given  by  the  Riemann  sum  (Fig.  7.1) 


n 

y^/(cfc)(xfc  -  xfc_i). 

fc= i 

Note  that  “area”  here  comes  with  the  understanding  that  areas  below  the  x-axis 
are  assigned  a  negative  value. 
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What  should  be  evident  from  the  graph  is  that  the  accuracy  of  the  Riemann- 
sum  approximation  seems  to  improve  as  the  rectangles  get  thinner.  In  some 
sense,  we  take  the  limit  of  these  approximating  Riemann  sums  as  the  width  of 
the  individual  subintervals  of  the  partitions  tends  to  zero.  This  limit,  if  it  exists, 
is  Riemann’ s  definition  of  J ^  f. 

This  brings  us  to  a  handful  of  questions.  Creating  a  rigorous  meaning  for 
the  limit  just  referred  to  is  not  too  difficult.  What  will  be  of  most  interest 
to  us — and  was  also  to  Riemann — is  deciding  what  types  of  functions  can  be 
integrated  using  this  procedure.  Specifically,  what  conditions  on  /  guarantee 
that  this  limit  exists? 

The  theory  of  the  Riemann  integral  turns  on  the  observation  that  smaller 
subintervals  produce  better  approximations  to  the  function  /.  On  each  subin¬ 
terval  [£fc_i,:Efc],  the  function  /  is  approximated  by  its  value  at  some  point 
Ck  £  [xk-i,Xk\-  The  quality  of  the  approximation  is  directly  related  to  the 
difference 

I f(x)  -  /(Cfc)l 


as  x  ranges  over  the  subinterval.  Because  the  subintervals  can  be  chosen  to 
have  arbitrarily  small  width,  this  means  that  we  want  f(x)  to  be  close  to  f(ck) 
whenever  x  is  close  to  c^.  But  this  sounds  like  a  discussion  of  continuity!  We 
will  soon  see  that  the  continuity  of  /  is  intimately  related  to  the  existence  of 
the  Riemann  integral  /. 

Is  continuity  sufficient  to  prove  that  the  Riemann  sums  converge  to  a  well- 
defined  limit?  Is  it  necessary,  or  can  the  Riemann  integral  handle  a  discontin¬ 
uous  function  such  as  h(x)  mentioned  earlier?  Relying  on  the  intuitive  notion 

of  area,  it  would  seem  that  JQ2  h  =  3,  but  does  the  Riemann  integral  reach  this 
conclusion?  If  so,  how  discontinuous  can  a  function  be  before  it  fails  to  be  inte¬ 
grate?  Can  the  Riemann  integral  make  sense  out  of  something  as  pathological 
as  Dirichlet’s  function  on  the  interval  [0, 1]? 

A  function  such  as 


x 2  sin(-)  for  x  ^  0 
0  for  x  =  0 


raises  another  interesting  question.  Here  is  an  example  of  a  differentiable  func¬ 
tion,  studied  in  Section  5.1,  where  the  derivative  g'(x)  is  not  continuous.  As  we 
explore  the  class  of  integrate  functions,  some  attempt  must  be  made  to  reunite 
the  integral  with  the  derivative.  Having  defined  integration  independently  of 
differentiation,  we  would  like  to  come  back  and  investigate  the  conditions  under 
which  equations  (i)  and  (ii)  from  the  Fundamental  Theorem  of  Calculus  stated 
earlier  hold.  If  we  are  making  a  wish  list  for  the  types  of  functions  that  we 
want  to  be  integrable,  then  in  light  of  equation  (i)  it  seems  desirable  to  expect 
this  set  to  at  least  contain  the  set  of  derivatives.  The  fact  that  derivatives  are 
not  always  continuous  is  further  motivation  not  to  content  ourselves  with  an 
integral  that  cannot  handle  some  discontinuities. 
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7.2  The  Definition  of  the  Riemann  Integral 


Although  it  has  the  benefit  of  some  polish  due  to  Darboux,  the  development 
of  the  integral  presented  in  this  chapter  is  closely  related  to  the  procedure  just 
discussed.  In  place  of  Riemann  sums,  we  will  construct  upper  sums  and  lower 
sums  (Fig.  7.2),  and  in  place  of  a  limit  we  will  use  a  supremum  and  an  infimum. 

Throughout  this  section,  it  is  assumed  that  we  are  working  with  a  bounded 
function  /  on  a  closed  interval  [a,  b],  meaning  that  there  exists  an  M  >  0  such 
that  \f(x)\  <  M  for  all  x  G  [a,  b\. 


Partitions,  Upper  Sums,  and  Lower  Sums 


Definition  7.2.1.  A  partition  P  of  [a,  b]  is  a  finite  set  of  points  from  [a,  b]  that 
includes  both  a  and  b.  The  notational  convention  is  to  always  list  the  points  of 
a  partition  P  =  {xq,  aq,  aq,  •  •  • ,  xn}  in  increasing  order;  thus, 


a  =  xq  <  x\  <  X2  <  •  •  •  <  xn  =  b. 


For  each  subinterval  [xk-i,xk\  of  P,  let 

mk  =  inf{/ (x)  :  x  G  [xk-i,xk]}  and  Mk  =  sup{/(x)  :  x  G  [xk-i,xk]}. 
The  lower  sum  of  /  with  respect  to  P  is  given  by 

n 

L(f,  P)  =  yt  mk(xk  -  xk-i). 
k= 1 

Likewise,  we  define  the  upper  sum  of  /  with  respect  to  P  by 

n 

U (/,  P)  =  -  Xk~i). 

k= 1 


For  a  particular  partition  P,  it  is  clear  that  U (/,  P)  >  L(/,  P).  The  fact  that  this 
same  inequality  holds  if  the  upper  and  lower  sums  are  computed  with  respect 
to  different  partitions  is  the  content  of  the  next  two  lemmas. 

Definition  7.2.2.  A  partition  Q  is  a  refinement  of  a  partition  P  if  Q  contains 
all  of  the  points  of  P;  that  is,  if  P  C  Q. 

Lemma  7.2.3.  If  PC  Q,  then  L(f ,  P)  <  L(f,  Q),  and  U(f,  P)  >  U(f,  Q ) . 

Proof.  Consider  what  happens  when  we  refine  P  by  adding  a  single  point  z  to 
some  subinterval  [xk-i,xk\  of  P. 
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Figure  7.2:  Upper  and  Lower  Sums. 


Focusing  on  the  lower  sum  for  a  moment,  we  have 


where 


mk(xk-xk- 1)  = 

< 


mk{xk  -  z)  +  mk(z  -  xk-i) 
mk(xk  -  z)  +  mk(z  -  xk-i), 


rn'k  =  inf  {f(x)  :  x  £  [z,  xk]}  and  rn".  =  inf  {f(x)  :  x  £  [xk-i,  z]} 


are  each  necessarily  as  large  or  larger  than  m k- 

By  induction,  we  have  L(/,  P )  <  L(f ,  Q),  and  an  analogous  argument  holds 
for  the  upper  sums.  □ 


Lemma  7.2.4.  If  Pi  and  P 2  are  any  two  partitions  of  [a,  b\,  then  L(f,Pi)  < 
U(f,p 2). 

Proof.  Let  Q  =  P\  U  P2  be  the  so-called  common  refinement  of  Pi  and  P2. 
Because  Pi  C  Q  and  P2  C  Q,  it  follows  that 


L(f,Pi)  <  L(f,Q)  <  U(f,Q)  <  U(f,P2). 


□ 
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Integrability 

Intuitively,  it  helps  to  visualize  a  particular  upper  sum  as  an  overestimate  for  the 
value  of  the  integral  and  a  lower  sum  as  an  underestimate.  As  the  partitions  get 
more  refined,  the  upper  sums  get  potentially  smaller  while  the  lower  sums  get 
potentially  larger.  A  function  is  integrable  if  the  upper  and  lower  sums  “meet” 
at  some  common  value  in  the  middle. 

Rather  than  taking  a  limit  of  these  sums,  we  will  instead  make  use  of  the 
Axiom  of  Completeness  and  consider  the  infimum  of  the  upper  sums  and  the 
supremum  of  the  lower  sums. 


Definition  7.2.5.  Let  V  be  the  collection  of  all  possible  partitions  of  the 
interval  [a,  b\.  The  upper  integral  of  /  is  defined  to  be 


U(f)  =  inf {{/(/,  P)  :  P  €  V}. 


In  a  similar  way,  define  the  lower  integral  of  /  by 


L(f)  =  snP{L(f,P):P€V}. 

The  following  fact  is  not  surprising. 

Lemma  7.2.6.  For  any  bounded  function  f  on  [a,  b\,  it  is  always  the  case  that 

u(f)  >  L(f). 


Proof.  Exercise  7.2.1. 


□ 


Definition  7.2.7  (Riemann  Integrability).  A  bounded  function  /  defined 
on  the  interval  [a,  b]  is  Riemann-integrable  if  U(f)  =  L(/).  In  this  case,  we 

define  J b  f  or  f(x)  dx  to  be  this  common  value;  namely, 


f  f  =  U(f)  =  L(f). 

J  a 


The  modifier  “Riemann”  in  front  of  “integrable”  accurately  suggests  that 
there  are  other  ways  to  define  the  integral.  In  fact,  our  work  in  this  chapter  will 
expose  the  need  for  a  different  approach,  one  of  which  is  discussed  in  Section  8.1. 
In  this  chapter,  the  Riemann  integral  is  the  only  method  under  consideration, 
so  it  will  usually  be  convenient  to  drop  the  modifier  “Riemann”  and  simply  refer 
to  a  function  as  being  “integrable.” 


Criteria  for  Integrability 


To  summarize  the  situation  thus  far,  it  is  always  the  case  for  a  bounded  function 
/  on  [a,  b]  that 


sup{L(/,  P):PeV}  =  L(f)  <  U(f)  =  inf  {£/(/,  P)  :  P  e  V}. 

The  function  /  is  integrable  if  the  inequality  is  an  equality.  The  major  thrust 
of  our  investigation  of  the  integral  is  to  describe,  as  best  we  can,  the  class 
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of  integrable  functions.  The  preceding  inequality  reveals  that  integr ability  is 
really  equivalent  to  the  existence  of  partitions  whose  upper  and  lower  sums  are 
arbitrarily  close  together. 


Theorem  7.2.8  (Integrability  Criterion).  A  bounded  function  f  is  inte¬ 
grable  on  [a,  b]  if  and  only  if,  for  every  e  >  0,  there  exists  a  partition  Pe  of  [a,  b] 
such  that 

U(f,Pe)  ~  L(f,Pe)  <  e. 


Proof.  Let  e  >  0.  If  such  a  partition  Pe  exists,  then 


U(f)  -  L(f)  <  U(f,  Pe)  -  L(f,  Pe)  <  e. 

Because  e  is  arbitrary,  it  must  be  that  U(f)  =  L(/),  so  /  is  integrable.  (To  be 
absolutely  precise  here,  we  could  throw  in  a  reference  to  Theorem  1.2.6.) 

The  proof  of  the  converse  statement  is  a  familiar  triangle  inequality  argument 
with  parentheses  in  place  of  absolute  value  bars  because,  in  each  case,  we  know 
which  quantity  is  larger.  Because  U (/)  is  the  greatest  lower  bound  of  the  upper 
sums,  we  know  that,  given  some  e  >  o,  there  must  exist  a  partition  P\  such  that 

U{f,Pl)<U(f)+e-. 

Likewise,  there  exists  a  partition  P2  satisfying 

L(f,P2)>L(f)-f 

Now,  let  Pe  =  Pi  U  P2  be  the  common  refinement.  Keeping  in  mind  that  the 
integrability  of  /  means  U(f)  =  L(/),  we  can  write 

U(f,Pe)  —  L(f,Pf)  < 

< 


U(f,Pi)  —  L(f,P2) 

mi)  + 1)  -  (m 


e 

2 


e  e 

—  +  —  =  6. 
2  2 


□ 


In  the  discussion  at  the  beginning  of  this  chapter,  it  became  clear  that  inte- 
grability  is  closely  tied  to  the  concept  of  continuity.  To  make  this  observation 
more  precise,  let  P  =  {xq,  x\,  #2,  •  •  • ,  xn}  be  an  arbitrary  partition  of  [a,  6],  and 
define  Axk  =  Xk  —  Xk-i-  Then, 


n 


U(f,  P)  -  L(f,  P)  =  ]T(Mfe  -  mk) Axk, 

k= 1 

where  and  m ^  are  the  supremum  and  infimum  of  the  function  on  the  interval 
Xk-i,Xk\,  respectively.  Our  ability  to  control  the  size  of  U(f,  P)—L(f,  P)  hinges 
on  the  differences  M —  m which  we  can  interpret  as  the  variation  in  the  range 
of  the  function  over  the  interval  [xk~i ,  #&].  Restricting  the  variation  of  /  over 
arbitrarily  small  intervals  in  [a,  b]  is  precisely  what  it  means  to  say  that  /  is 
uniformly  continuous  on  this  set. 


222 


Chapter  7.  The  Riemann  Integral 


Theorem  7.2.9.  If  f  is  continuous  on  [a,b\,  then  it  is  integrable 


Proof.  Because  /  is  continuous  on  a  compact  set,  it  must  be  bounded.  It  is  also 
uniformly  continuous  for  the  same  reason.  This  means  that,  given  e  >  0,  there 
exists  a  S  >  0  so  that  \x  —  y\  <  S  guarantees 


l/CO  -  f(y) I  < 


a 


Now,  let  P  be  a  partition  of  [a,  b]  where  Axk  =  Xk  —  %k-i  is  less  than  5  for 
every  subinterval  of  P. 


•K  k  k  —  1  ^  ^ 


Given  a  particular  subinterval  [xk-i,Xk]  of  P,  we  know  from  the  Extreme 
Value  Theorem  (Theorem  4.4.2)  that  the  supremum  M&  =  f{zjf)  for  some  Zk  E 
Xk-i,Xk\-  In  addition,  the  infimum  m &  is  attained  at  some  point  also  in  the 
interval  [xk-i,Xk\-  But  this  means  Zk  —  yr  <  so 


Mk  —  rrik  =  f(zk)  -  f(yk)  < 


a 


Finally, 


n 


n 


U(f,  P )  -  L(f,  P )  =  ~  mk)Axk  <  —a  Y,  =  e, 


k=l 


k= 1 


and  /  is  integrable  by  the  criterion  given  in  Theorem  7.2.8. 


□ 


Exercises 

Exercise  7.2.1.  Let  /  be  a  bounded  function  on  [a,  6],  and  let  P  be  an  arbitrary 
partition  of  [a,  b\.  First,  explain  why  U (/)  >  L(/,  P).  Now,  prove  Lemma  7.2.6. 


Exercise  7.2.2.  Consider  f(x)  =  1/x  over  the  interval  [1,4].  Let  P  be  the 
partition  consisting  of  the  points  {1,  3/2,  2, 4}. 
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(a)  Compute  L(f,P),  U(f,P),  and  U(f,P)  -  L(f,P). 

(b)  What  happens  to  the  value  of  U (/,  P )  —  L(f ,  P )  when  we  add  the  point  3 
to  the  partition? 

(c)  Find  a  partition  P'  of  [1,4]  for  which  [/(/,  P')  —  L(/,  P7)  <  2/5. 

Exercise  7.2.3  (Sequential  Criterion  for  Integrability).  (a)  Prove  that 
a  bounded  function  /  is  integrable  on  [a,  b }  if  and  only  if  there  exists  a 
sequence  of  partitions  (Pn)^L i  satisfying 

hm  [U (/,  Pn)  ~  L(f,  Pn)\  =  0, 


n— »-  oo 


and  in  this  case  £  /  =  limn->oo  U  (/,  Pn)  =  lim  n— >•  OO  £(/,  Pn)- 

(b)  For  each  n,  let  Pn  be  the  partition  of  [0, 1]  into  n  equal  subintervals.  Find 
formulas  for  U (/,  Pn)  and  L(/,  Pn)  if  f(pc)  =  x.  The  formula  1  +  2  +  3  + 
•  •  •  +  n  =  n (n  +  l)/2  will  be  useful. 

(c)  Use  the  sequential  criterion  for  integrability  from  (a)  to  show  directly  that 
f(x)  =  x  is  integrable  on  [0, 1]  and  compute  f*  f . 

Exercise  7.2.4.  Let  g  be  bounded  on  [a,  b]  and  assume  there  exists  a  partition 
P  with  L(g,P)  =  U(g,P).  Describe  g.  Is  it  integrable?  If  so,  what  is  the  value 

°f  fa  X 

Exercise  7.2.5.  Assume  that,  for  each  n,  fn  is  an  integrable  function  on  [a,  b\. 
If  (fn)  f  uniformly  on  [a,  6],  prove  that  /  is  also  integrable  on  this  set.  (We 
will  see  that  this  conclusion  does  not  necessarily  follow  if  the  convergence  is 
pointwise.) 

Exercise  7.2.6.  A  tagged  partition  (P,  {c/c })  is  one  where  in  addition  to  a 
partition  P  we  choose  a  sampling  point  Ck  in  each  of  the  subintervals  [xk-i,  Xk 
The  corresponding  Riemann  sum , 


n 


R(f,P)  =  J2  /(cfc)  Axk, 

k= i 

is  discussed  in  Section  7.1,  where  the  following  definition  is  alluded  to. 
Riemann’s  Original  Definition  of  the  Integral:  A  bounded  function  /  is 

integrable  on  [a,  b]  with  f^f  =  A  if  for  all  e  >  0  there  exists  a  S  >  0  such  that 
for  any  tagged  partition  (P,  {c^})  satisfying  Axk  <  S  for  all  fc,  it  follows  that 

\R(f,P)-A\  <e. 

Show  that  if  /  satisfies  Riemann’s  definition  above,  then  /  is  integrable  in  the 
sense  of  Definition  7.2.7.  (The  full  equivalence  of  these  two  characterizations  of 
integrability  is  proved  in  Section  8.1.) 


Exercise  7.2.7.  Let  /  :  [a,  b\  R  be  increasing  on  the  set  [a,  b]  (i.e.,  f(x)  < 
f(y )  whenever  x  <  y).  Show  that  /  is  integrable  on  [a,  b\. 
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7.3  Integrating  Functions  with  Discontinuities 


The  fact  that  continuous  functions  are  integrable  is  not  so  much  a  fortunate 
discovery  as  it  is  evidence  for  a  well-designed  integral.  Riemann’s  integral  is  a 
modification  of  Cauchy’s  definition  of  the  integral,  and  Cauchy’s  definition  was 
crafted  specifically  to  work  on  continuous  functions.  The  interesting  issue  is 
discovering  just  how  dependent  the  Riemann  integral  is  on  the  continuity  of  the 
integrand. 


Example  7.3.1.  Consider  the  function 


1  for  x  ^  1 
0  for  x  =  1 


on  the  interval  [0,2].  If  P  is  any  partition  of  [0,2],  a  quick  calculation  reveals 
that  U(f,P)  =  2.  The  lower  sum  L(/,  P)  will  be  less  than  2  because  any 
subinterval  of  P  that  contains  x  =  1  will  contribute  zero  to  the  value  of  the 
lower  sum.  The  way  to  show  that  /  is  integrable  is  to  construct  a  partition  that 
minimizes  the  effect  of  the  discontinuity  by  embedding  x  =  1  into  a  very  small 
subinterval. 

Let  e  >  0,  and  consider  the  partition  Pe  =  {0, 1  —  e/3, 1  +  e/3,  2}.  Then, 

L(f,Pe)  = 


Because  [/(/,  Pe)  =  2,  we  have 

U(f,Pe)~L(f,Pe)  =  le<e. 

We  can  now  use  Theorem  7.2.8  to  conclude  that  /  is  integrable. 


i(i-l)+ow  +  i(i-i) 
2 


Although  the  function  in  Example  7.3.1  is  extremely  simple,  the  method 
used  to  show  it  is  integrable  is  really  the  same  one  used  to  prove  that  any 
bounded  function  with  a  single  discontinuity  is  integrable.  The  notation  in  the 
following  proof  is  more  cumbersome,  but  the  essence  of  the  argument  is  that  the 
misbehavior  of  the  function  at  its  discontinuity  is  isolated  inside  a  particularly 
small  subinterval  of  the  partition. 


Theorem  7.3.2.  If  f  :  [a,  b\  R  is  bounded,  and  f  is  integrable  on  [c,  b]  for  all 
c  G  (a,  b),  then  f  is  integrable  on  [a,b\.  An  analogous  result  holds  at  the  other 
endpoint . 


Proof.  Let  e  >  0.  As  usual,  our  task  is  to  produce  a  partition  P  such  that 
U(f,P)-L(f,P)  <  e.  For  any  partition,  we  can  always  write 


U(f,  P)  —  L(f,  P)  =  ^(Mfc-TOfc)  Axk 

k= 1 

n 

=  (Mi  -  m{)(xi  -  a)  +  ^>4  -  mk)Axk, 

k= 2 
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so  the  first  step  is  to  choose  x\  close  enough  to  a  so  that 

(Mi  —  mi)(xi  —  a)  <  -. 


This  is  not  too  difficult.  Because  /  is  bounded,  we  know  there  exists  M  >  0 
satisfying  \  f(x)\  <  M  for  all  x  E  [a,  &].  Noting  that  Mi  —  mi  <  2 M,  let’s  pick 
aq  so  that 

e 

xi  —  a  < 


4  M 


Now,  by  hypothesis,  /  is  integrable  on  [aq,b],  so  there  exists  a  partition  Pi  of 
aq,  b]  for  which 

U(f,P1)-L(f,P1 )  < 


Finally,  we  let  P  =  {a}  U  Pi  be  a  partition  of  [a,  6],  from  which  it  follows 


that 


P(/,P)-P(/,P) 


< 

< 


(2M)(xi-a)  +  (P(/,Pi)-P(/,Pi)) 


e  e 


□ 


Theorem  7.3.2  enables  us  to  prove  that  a  bounded  function  on  a  closed 
interval  with  a  single  discontinuity  at  an  endpoint  is  still  integrable.  In  the 
next  section,  we  will  prove  that  integrability  on  the  intervals  [a,  b }  and  [6,  d\ 
is  equivalent  to  integrability  on  [a,  d].  This  property,  together  with  an  induc¬ 
tion  argument,  leads  to  the  conclusion  that  any  function  with  a  finite  number 
of  discontinuities  is  still  integrable.  What  if  the  number  of  discontinuities  is 
infinite? 


Example  7.3.3.  Recall  Dirichlet’s  function 

f  \  /  1  for  x  rational 

9\x)  |  q  for  x  irrational 

from  Section  4.1.  If  P  is  some  partition  of  [0, 1],  then  the  density  of  the  rationals 
in  R  implies  that  every  subinterval  of  P  will  contain  a  point  where  g(x)  =  1.  It 
follows  that  U(g,P)  =  1.  On  the  other  hand,  L(g,P)  =  0  because  the  irrationals 
are  also  dense  in  R.  Because  this  is  the  case  for  every  partition  P,  we  see  that 
the  upper  integral  U(f)  =  1  and  the  lower  integral  L(f)  =  0.  The  two  are  not 
equal,  so  we  conclude  that  Dirichlet’s  function  is  not  integrable. 

How  discontinuous  can  a  function  be  before  it  fails  to  be  integrable?  Before 
jumping  to  the  hasty  (and  incorrect)  conclusion  that  the  Riemann  integral  fails 
for  functions  with  more  than  a  finite  number  of  discontinuities,  we  should  realize 
that  Dirichlet’s  function  is  discontinuous  at  every  point  in  [0,1].  It  would  be 
useful  to  investigate  a  function  where  the  discontinuities  are  infinite  in  number 
but  do  not  necessarily  make  up  all  of  [0,1].  Thomae’s  function,  also  defined 
in  Section  4.1,  is  one  such  example.  The  discontinuous  points  of  this  function 
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are  precisely  the  rational  numbers  in  [0, 1].  In  the  exercises  to  follow  we  will 
see  that  Thomae’s  function  is  Riemann-integrable,  raising  the  bar  for  allowable 
discontinuous  points  to  include  potentially  infinite  sets. 

The  conclusion  of  this  story  is  contained  in  the  doctoral  dissertation  of  Henri 
Lebesgue,  who  presented  his  work  in  1901.  Lebesgue’s  elegant  criterion  for 
Riemann  integrability  is  explored  in  great  detail  in  Section  7.6.  For  the  moment, 
though,  we  will  take  a  short  detour  from  questions  of  integrability  and  construct 
a  proof  of  the  celebrated  Fundamental  Theorem  of  Calculus. 


Exercises 


Exercise  7.3.1.  Consider  the  function 


f  1  for  0  <  x  <  1 
}  2  for  x  =  1 


over  the  interval  [0,1]. 

(a)  Show  that  L(/,  P)  =  1  for  every  partition  P  of  [0, 1]. 

(b)  Construct  a  partition  P  for  which  [/(/,  P)  <  1  +  1/10. 

(c)  Given  e  >  0,  construct  a  partition  Pe  for  which  U (/,  Pe)  <  1  +  e. 
Exercise  7.3.2.  Recall  that  Thomae’s  function 

f  1  if  x  =  0 

t(x)  =  <  1/n  if  x  =  m/n  <E  Q\{0}  is  in  lowest  terms  with  n  >  0 

[  0  if  x  Q 

has  a  countable  set  of  discontinuities  occurring  at  precisely  every  rational  num¬ 
ber.  Follow  these  steps  to  prove  t(x)  is  integrable  on  [0, 1]  with  fQ  t  =  0. 

(a)  First  argue  that  L(t,  P)  =  0  for  any  partition  P  of  [0, 1]. 

(b)  Let  e  >  0,  and  consider  the  set  of  points  De/ 2  =  {x  E  [0, 1]  :  t(x)  >  e/2}. 
How  big  is  De/ 2? 

(c)  To  complete  the  argument,  explain  how  to  construct  a  partition  Pe  of  [0, 1] 
so  that  U (£,  Pe)  <  e. 

Exercise  7.3.3.  Let 

rf  \  (  1  if  x  =  1/n  for  some  n  G  N 

J\x)  }  q  otherwise. 

Show  that  /  is  integrable  on  [0, 1]  and  compute  fQ  /. 

Exercise  7.3.4.  Let  /  and  g  be  functions  defined  on  (possibly  different)  closed 
intervals,  and  assume  the  range  of  /  is  contained  in  the  domain  of  g  so  that  the 
composition  g  o  f  is  properly  defined. 
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(a)  Show,  by  example,  that  it  is  not  the  case  that  if  /  and  g  are  integrable, 
then  g  o  f  is  integrable. 

Now  decide  on  the  validity  of  each  of  the  following  conjectures,  supplying 
a  proof  or  counterexample  as  appropriate. 

(b)  If  /  is  increasing  and  g  is  integrable,  then  g  o  /  is  integrable. 

(c)  If  /  is  integrable  and  g  is  increasing,  then  go  f  is  integrable. 


Exercise  7.3.5.  Provide  an  example  or  give  a  reason  why  the  request  is  im¬ 
possible. 


(a)  A  sequence  (fn)  — )•  /  pointwise,  where  each  fn  has  at  most  a  finite  number 
of  discontinuities  but  /  is  not  integrable. 

(b)  A  sequence  (gn)  g  uniformly  where  each  gn  has  at  most  a  finite  number 
of  discontinuities  and  g  is  not  integrable. 

(c)  A  sequence  (hn)  —>  h  uniformly  where  each  hn  is  not  integrable  but  h  is 
integrable. 

Exercise  7.3.6.  Let  {rq,  rq, 7*3, . . .}  be  an  enumeration  of  all  the  rationals  in 
[0, 1],  and  define 


Q  (X)  =  i  1  ^  ^  =  ^ 

'0  otherwise. 


(a)  Is  G(x)  =  9 nix)  integrable  on  [0, 1]? 

(b)  Is  F(pc)  =  9n{x)/n  integrable  on  [0, 1]? 

Exercise  7.3.7.  Assume  /  :  [a,  b\  R  is  integrable 


Show  that  if  g  satisfies  g(x)  =  f(x) 
a,  6],  then  g  is  integrable  as  well. 


m 


for  all  but  a  finite  number  of  points 


Find  an  example  to  show  that  g  may  fail  to  be  integrable  if  it  differs  from 
/  at  a  countable  number  of  points. 


Exercise  7.3.8.  As  in  Exercise  7.3.6,  let  {ri,7*2,r3, . . .}  be  an  enumeration  of 
the  rationals  in  [0, 1],  but  this  time  define 


f  1  if  rn  <  x  <  1 
|  0  if  0  <  x  <  rn. 


Show  H(x)  =  hn(x)/2n  is  integrable  on  [0, 1]  even  though  it  has  discon¬ 

tinuities  at  every  rational  point. 
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Exercise  7.3.9  (Content  Zero).  A  set  A  C  [a,  b }  has  content  zero  if  for  every 
e  >  0  there  exists  a  finite  collection  of  open  intervals  {Oi,  O2,  •  •  • ,  On}  that 
contain  A  in  their  union  and  whose  lengths  sum  to  e  or  less.  Using  \On\  to  refer 
to  the  length  of  each  interval,  we  have 


N 


N 


iC  |J  On 


and 


n— 1 


Eio 

n— 1 


<  e. 


(a)  Let  /  be  bounded  on  [a,  b\.  Show  that  if  the  set  of  discontinuous  points  of 
/  has  content  zero,  then  /  is  integrable. 

(b)  Show  that  any  finite  set  has  content  zero. 

(c)  Content  zero  sets  do  not  have  to  be  finite.  They  do  not  have  to  be  count¬ 
able.  Show  that  the  Cantor  set  C  defined  in  Section  3.1  has  content  zero. 


(d)  Prove  that 

1  if  x  E  C 
0  if  x  C. 

is  integrable,  and  find  the  value  of  the  integral. 


h(x) 


7.4  Properties  of  the  Integral 

Before  embarking  on  the  proof  of  the  Fundamental  Theorem  of  Calculus,  we 
need  to  verify  what  are  probably  some  very  familiar  properties  of  the  integral. 
The  discussion  in  the  previous  section  has  already  made  use  of  the  following 
fact. 

Theorem  7.4.1.  Assume  f  :  [a,  b\  R  is  bounded,  and  let  c  E  (a,  b).  Then, 


f  is  integrable  on 
case,  we  have 


a,  b]  if  and  only  if  f  is  integrable  on  [a,  c]  and  [c,  b] .  In  this 


f 


/+  /  /• 


a 


a 


Proof.  If  /  is  integrable  on  [a,  b\,  then  for  e  >  0  there  exists  a  partition  P  such 
that  U(f,  P)  -  L(f,  P)  <  e.  Because  refining  a  partition  can  only  potentially 
bring  the  upper  and  lower  sums  closer  together,  we  can  simply  add  c  to  P  if 
it  is  not  already  there.  Then,  let  P\  =  P  D  [a,  c\  be  a  partition  of  [a,  c],  and 
P2  =  P  D  [c,  b]  be  a  partition  of  [c,  b\.  It  follows  that 

U(f,P1)-L(f,P1)<e  and  U(f,  P2)  -  L(f,  P2)  <  e, 

implying  that  /  is  integrable  on  [a,c]  and  [c,  b\. 

Conversely,  if  we  are  given  that  /  is  integrable  on  the  two  smaller  intervals 
a ,  c]  and  [c,  b\,  then  given  an  e  >  0  we  can  produce  partitions  P\  and  P2  of 
and  [c,  b] ,  respectively,  such  that 

£/(/,Pi)-L(/,Pi)  <  |  and  U(f,  P2)  -  L(f,  P2)  < 


a ,  c 
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Letting  P  =  P\  U  P2  produces  a  partition  of  [a,  b]  for  which 


U(f,P)  —  L(f,P)  <  e. 


Thus,  /  is  integrable  on  [a,  b\. 

Continuing  to  let  P  =  P\  U  P2  as  earlier,  we  have 


[  f<U(f,P)  <  L(f,P)+e 

J  a 


L(f,P1)+L(f,P2)+e 
<  I  f+  I"  f  +  e, 


a 


which  implies  j  f  <  fc  f  +  J  f.  To  get  the  other  inequality,  observe  that 


<  U(f,P1)  +  U(f,P2) 

<  L(f,P1)+L(f,P2)  +  e 
=  L(f,P)  +  e 


Because  e  >  0  is  arbitrary,  we  must  have  f°f  +  J  /  <  Sa  />  80 


as  desired. 


□ 


The  proof  of  Theorem  7.4.1  demonstrates  some  of  the  standard  techniques 
involved  for  proving  facts  about  the  Riemann  integral.  The  next  result  catalogs 
the  remainder  of  the  basic  properties  of  the  integral  that  we  will  need  in  our 
upcoming  arguments. 


Theorem  7.4.2.  Assume  f  and  g  are  integrable  functions  on  the  interval  [a,  b\ 

(i)  The  function  f  +  g  is  integrable  on  [a,  b]  with  J^(f  +  g)  =  L/  +  LV 

(ii)  For  k  G  R,  the  function  kf  is  integrable  with  J ^  ■  /■•/  j:  ./• 


(iii)  If  m  <  f(x)  <  M  on  [a,  b] ,  then  m(b  —  a)  <  Jb  f  <  M(b  —  a) 


(iv)  If  f(x)  <  g{x)  on  [a,  6],  then  fbf  <  Jbg. 

(v)  The  function  \  f\  is  integrable  and  \  Ilf  i  <  I  i/i 
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Proof.  Properties  (i)  and  (ii)  are  reminiscent  of  the  Algebraic  Limit  Theorem 
and  its  many  descendants  (Theorems  2.3.3,  2.7.1,  4.2.4,  and  5.2.4).  In  fact, 
there  is  a  way  to  use  the  Algebraic  Limit  Theorem  for  this  argument  as  well. 
An  immediate  corollary  to  Theorem  7.2.8  is  that  a  function  /  is  integrable  on 
a,  b }  if  and  only  if  there  exists  a  sequence  of  partitions  (Pn)  satisfying 


lim  [U(f,  Pn)  ~  L(f ,  Pn)\  =  0, 


71— >■  OO 


and  in  this  case  f  =  limC/(/,  Pn)  =  lim  L(/,Pn).  (A  proof  for  this  was 
requested  as  Exercise  7.2.3.) 

To  prove  (ii)  for  the  case  k  >  0,  first  verify  that  for  any  partition  P  we  have 


U{kf ,  P)  =  kU(f ,  P)  and  L(kf ,  P)  =  kL(f ,  P). 

Exercise  1.3.5  is  used  here.  Because  /  is  integrable,  there  exist  partitions  (Pn) 
satisfying  (1).  Turning  our  attention  to  the  function  (fc/),  we  see  that 


lim  [U(kf,Pn)-L(kf,Pn) 

n— >oo 


lim  k[U(f,Pn)-L(f,Pn)]  =  0, 


n— >■  oo 


and  the  formula  in  (ii)  follows.  The  case  where  fc  <  0  is  similar  except  that  we 
have 


U(kf ,  Pn)  =  kL{f ,  Pn)  and  L(kf ,  Pn)  =  kU(f ,  Pn). 


A  proof  for  (i)  can  be  constructed  using  similar  methods  and  is  requested  in 
Exercise  7.4.5. 


To  prove  (iii),  observe  that 


U(f,P)> 


f>L(f,P ) 


for  any  partition  P.  Statement  (iii)  follows  if  we  take  P  to  be  the  trivial  partition 
consisting  of  only  the  endpoints  a  and  b. 

For  (iv),  let  h  —  g  —  f  and  use  (i),  (ii),  and  (iii). 

Because  —\f(x)\  <  f{pc)  <  \f(x)\  on  [a,  6],  statement  (v)  will  follow  from  (iv) 
provided  that  we  can  show  that  |/|  is  actually  integrable.  The  proof  of  this  fact 
is  outlined  in  Exercise  7.4.1.  □ 


To  this  point,  the  quantity  fa  f  is  only  defined  in  the  case  where  a  <  b. 
Definition  7.4.3.  If  /  is  integrable  on  the  interval  [a,  6],  define 


/• 


Also,  for  c  G  [a,  b]  define 
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Definition  7.4.3  is  a  natural  convention  to  simplify  the  algebra  of  integrals. 
If  /  is  an  integrable  function  on  some  interval  /,  then  it  is  straightforward  to 
verify  that  the  equation 

r>b  nC  rb 

y  w f+  f 

J  a  J  a  J  c 

from  Theorem  7.4.1  remains  valid  for  any  three  points  a,  b,  and  c  chosen  in  any 
order  from  I. 


Uniform  Convergence  and  Integration 


If  (/n)  is  a  sequence  of  integrable  functions  on  [a,  b],  and  if  /' 
inevitably  going  to  want  to  know  whether 


/,  then  we  are 


/• 


This  is  an  archetypical  instance  of  one  of  the  major  themes  of  analysis:  When 
does  a  mathematical  manipulation  such  as  integration  respect  the  limiting  pro¬ 
cess? 

If  the  convergence  is  pointwise,  then  any  number  of  things  can  go  wrong.  It 
is  possible  for  each  fn  to  be  integrable  but  for  the  limit  /  not  to  be  integrable 
(Exercise  7.3.5).  Even  if  the  limit  function  /  is  integrable,  equation  (2)  may  fail 
to  hold.  As  an  example  of  this,  let 


j  n  if  0  <  x  <  1  jn 
\  0  if  x  =  0  or  x  >  1  jn. 


Each  fn  has  two  discontinuities  on  [0, 1]  and  so  is  integrable  with  fQ  fn  =  1. 
For  each  x  G  [0, 1],  we  have  lim  fn{x)  =  0  so  that  fn  0  pointwise  on  [0, 1]. 
But  now  observe  that  the  limit  function  /  =  0  certainly  integrates  to  0,  and 

0  ^  lim  [  fn. 

n^ooJo 

As  a  final  remark  on  what  can  go  wrong  in  (2),  we  should  point  out  that  it  is 

possible  to  modify  this  example  to  produce  a  situation  where  lim  fQ  fn  does  not 
even  exist. 

One  way  to  resolve  all  of  these  problems  is  to  add  the  assumption  of  uniform 
convergence. 


Theorem  7.4.4  (Integrable  Limit  Theorem).  Assume  that  fn  f  uni¬ 
formly  on  [a,  b\  and  that  each  fn  is  integrable.  Then,  f  is  integrable  and 


lim 

n— too 


/• 
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Proof.  The  proof  that  /  is  integrable  was  requested  as  Exercise  7.2.5.  The 
properties  of  the  integral  listed  in  Theorem  7.4.2  allow  us  to  assert  that  for 
any  fn, 


fn 


f 


a 


a 


(fn  ~  f ) 


a 


<  /  \fn~fl 


a 


Let  e  0  be  arbitrary.  Because  fn  — y  f  uniformly,  there  exists  an  N  such  that 
I  fn  (x)  —  f(x)  |  <  e/(b  —  a)  for  all  n  >  N  and  x  G  [a,  b\. 

Thus,  for  n  >  N  we  see  that 


•6  nb 

fn-  f 


a 


a 


< 


I  fn  ~  f  | 


a 


< 


a 


b  —  a 


=  T 


and  the  result  follows. 


□ 


Exercises 

Exercise  7.4.1.  Let  /  be  a  bounded  function  on  a  set  A ,  and  set 

M  =  sup{/(x)  :  x  G  A},  m  =  inf {f(x)  :  x  G  A }, 

M'  =  sup{|/(x)|  :  x  G  A},  and  m!  =  inf{|/(x)|  :  x  G  A}. 

(a)  Show  that  M  —  m  >  M'  —  m! . 

(b)  Show  that  if  /  is  integrable  on  the  interval  [a,  6],  then  |/|  is  also  integrable 
on  this  interval. 


(c)  Provide  the  details  for  the  argument  that  in  this  case  we  have  i/: /i  < 

/a  l/l- 

Exercise  7.4.2.  (a)  Let  g(x)  =  x3,  and  classify  each  of  the  following  as  pos¬ 

itive,  negative,  or  zero. 

n  —  1  pi  p  0  pi  p  —  2  p  1 

(i)  /  9+  9  (ii)  9+  9  (iii)  /  9+  9- 


0 


'0 


0 


(b)  Show  that  if  b  <  a  <  c  and  /  is  integrable  on  the  interval  [6,  c],  then  it  is 


still  the  case  that  f^f  =  f  -f  f. 


Exercise  7.4.3.  Decide  which  of  the  following  conjectures  is  true  and  supply 
a  short  proof.  For  those  that  are  not  true,  give  a  counterexample. 


(a)  If  |/|  is  integrable  on  [a,  6],  then  /  is  also  integrable  on  this  set. 
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(b)  Assume  g  is  integrable  and  g{pc)  >  0  on  [a,  b\.  If  g(x)  >  0  for  an  infinite 
number  of  points  x  G  [a,  b],  then  g  >  0. 

(c)  If  g  is  continuous  on  [a,  b]  and  g{pc)  >  0  with  g(yo)  >  0  for  at  least  one 
point  i/o  C  [<T  &]?  then  fag>  0. 

Exercise  7.4.4.  Show  that  if  f(x)  >  0  for  all  x  G  [a,  b]  and  /  is  integrable, 
then  /  >  0. 

Exercise  7.4.5.  Let  /  and  g  be  integrable  functions  on  [a,  b]. 

(a)  Show  that  if  P  is  any  partition  of  [a,  b],  then 

U(f  +  g,P)<U(f,P)  +  U(g,P). 

Provide  a  specific  example  where  the  inequality  is  strict.  What  does  the 
corresponding  inequality  for  lower  sums  look  like? 

(b)  Review  the  proof  of  Theorem  7.4.2  (ii),  and  provide  an  argument  for  part 
(i)  of  this  theorem. 

Exercise  7.4.6.  Although  not  part  of  Theorem  7.4.2,  it  is  true  that  the  product 
of  integrable  functions  is  integrable.  Provide  the  details  for  each  step  in  the 
following  proof  of  this  fact: 

(a)  If  /  satisfies  \  f(x)\  <  M  on  [a,  b],  show 

\(f(x))2  -  (f(y))2\  <2M\f(x)  -  f(y)\. 


(b)  Prove  that  if  /  is  integrable  on  [a,  b],  then  so  is  /2. 

(c)  Now  show  that  if  /  and  g  are  integrable,  then  fg  is  integrable.  (Consider 

(/  +  ^)20 

Exercise  7.4.7.  Review  the  discussion  immediately  preceding  Theorem  7.4.4. 

(a)  Produce  an  example  of  a  sequence  fn  0  pointwise  on  [0, 1]  where 
linin^oo  Jq  fn  does  not  exist. 

(b)  Produce  an  example  of  a  sequence  gn  with  gn  0  but  gn(x)  does  not 
converge  to  zero  for  any  x  G  [0, 1].  To  make  it  more  interesting,  let’s  insist 
that  gn(x)  >  0  for  all  x  and  n. 

Exercise  7.4.8.  For  each  n  G  N,  let 


7  ,  ,  f  l/2n  if  l/2n  <  x  <  1 

hn{x)  -  |  o  if  0  <  £  <  1/2” 


and  set  H(x)  =  Show  H  is  integrable  and  compute  H . 
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Exercise  7.4.9.  Let  gn  and  g  be  uniformly  bounded  on  [0,1],  meaning  that 
there  exists  a  single  M  >  0  satisfying  \g(x)\  <  M  and  \gn{x)\  <  M  for  all  n  G  N 
and  x  G  [0, 1].  Assume  gn  g  pointwise  on  [0, 1]  and  uniformly  on  any  set  of 
the  form  [0,  a],  where  0  <  a  <  1. 

If  all  the  functions  are  integrable,  show  that  liiUn^oo  fQ  # n  —  JQ 
Exercise  7.4.10.  Assume  g  is  integrable  on  [0, 1]  and  continuous  at  0.  Show 


'o  9n 


Job- 


lim 

n— oo 


g(xn)dx  =  g(0). 


Exercise  7.4.11.  Review  the  original  definition  of  integrability  in  Section  7.2, 
and  in  particular  the  definition  of  the  upper  integral  U (/).  One  reasonable  sug¬ 
gestion  might  be  to  bypass  the  complications  introduced  in  Definition  7.2.7  and 
simply  define  the  integral  to  be  the  value  of  U (/).  Then  every  bounded  function 
is  integrable!  Although  tempting,  proceeding  in  this  way  has  some  significant 
drawbacks.  Show  by  example  that  several  of  the  properties  in  Theorem  7.4.2  no 
longer  hold  if  we  replace  our  current  definition  of  integrability  with  the  proposal 
that  jb  f  —  JJ (/)  for  every  bounded  function  /. 


7.5  The  Fundamental  Theorem  of  Calculus 


The  derivative  and  the  integral  have  been  independently  defined,  each  in  its  own 
rigorous  mathematical  terms.  The  definition  of  the  derivative  is  motivated  by 
the  problem  of  finding  slopes  of  tangent  lines  and  is  given  in  terms  of  functional 
limits  of  difference  quotients.  The  definition  of  the  integral  grows  out  of  the 
desire  to  calculate  areas  under  nonconstant  functions  and  is  given  in  terms  of 
supremums  and  infimums  of  finite  sums.  The  Fundamental  Theorem  of  Calculus 
reveals  the  remarkable  inverse  relationship  between  the  two  processes. 

The  result  is  stated  in  two  parts.  The  first  is  a  computational  statement 
that  describes  how  an  antiderivative  can  be  used  to  evaluate  an  integral  over 
a  particular  interval.  The  second  statement  is  more  theoretical  in  nature,  ex¬ 
pressing  the  fact  that  every  continuous  function  is  the  derivative  of  its  indefinite 
integral. 


Theorem  7.5.1  (Fundamental  Theorem  of  Calculus).  (i)  Iff  :  [a,  b] 

R  is  integrable,  and  F  :  [a,  b\  R  satisfies  F\x)  =  f(x)  for  all  x  G  [a,  b\, 
then 


[  f  =  F(b)  —  F(a). 

J  a 


(ii)  Let  g  :  [a,  b]  R  be  integrable,  and  for  x  G  [a,  b\,  define 


* X 


G(x) 


9- 


a 


Then  G  is  continuous  on  [a,  b\.  If  g  is  continuous  at  some  point  c  G  [a,  b\, 
then  G  is  differentiable  at  c  and  G'(c)  =  g(c). 
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Proof,  (i)  Let  P  be  a  partition  of  [a,  b\  and  apply  the  Mean  Value  Theorem  to 
F  on  a  typical  subinterval  [xk-i,Xk\  of  P.  This  yields  a  point  tk  G  (xk-i,Xk) 
where 


F(xk)  -  F(xk- 1)  =  F'(tk){xk  -  Xk-i) 

=  f(tk)(Xk-Xk- 1). 

Now,  consider  the  upper  and  lower  sums  U(f,P)  and  L(/,  P).  Because  < 
f(tk)  <  Mk  (where  m &  is  the  infimum  on  [xk-i,Xk]  and  M \  is  the  supremum), 
it  follows  that 


n 


L(f,  P)<J2  -  F(*k- 1)]  <  U(f,  P ) 


k  =  1 


But  notice  that  the  sum  in  the  middle  telescopes  so  that 


n 


E  [F{xk)  -  F(a*_i)]  =  F(b)  -  F(a), 


k= 1 


which  is  independent  of  the  partition  P.  Thus  we  have 

L(f)  <  F(b)  —  F(a)  <  U(f). 

Because  L(f )  =  U (/)  =  J ^  /,  we  conclude  that  f^f  =  F(b)  —  F(a). 

(ii)  To  prove  the  second  statement,  take  x  >  y  in  [a,  b]  and  observe  that 


I  G(x)-G(y) 


'X  ry 

9-  9 

a  J  a 


* x 


y 

* X 


< 


9 


9 


y 


<  M(x  —  y), 

where  M  >  0  is  a  bound  on  \g\.  This  shows  that  G  is  Lipschitz  and  so  is 
uniformly  continuous  on  [a,  b]  (Exercise  4.4.9). 

Now,  let’s  assume  that  g  is  continuous  at  c  E  [a,  b\.  In  order  to  show  that 
G'(c)  =  p(c),  we  rewrite  the  limit  for  G'{c )  as 


X^c  X  —  C 


=  lim 


X^c  X  —  C 

1 


nX  PC 

/  g(t)  dt-  g(t) 

J  a  J  a 


dt 


* x 


x^c  X  —  C 


g(t )  dt 


We  would  like  to  show  that  this  limit  equals  g(c).  Thus,  given  an  e  >  0,  we 
must  produce  a  <5  >  0  such  that  if  x  —  cl  <  S,  then 


1 


x  —  c 


g(t)  dtj  -  g(c) 


(1) 


<  e. 
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The  assumption  of  continuity  of  g  gives  us  control  over  the  difference  \g(t)—g{c) 
In  particular,  we  know  that  there  exists  a  8  >  0  such  that 


t  —  c\  <  S  implies  | g(t)  —  g(c)\  <  e 


To  take  advantage  of  this,  we  cleverly  write  the  constant  g(c)  as 


9(c) 


1 


*CC 


x  —  c 


9(c)  dt 


and  combine  the  two  terms  in  equation  (1)  into  a  single  integral.  Keeping  in 


mind  that 


x  —  c 


1 


x  —  c 


>  t  —  c  ,  we  have  that  for  all  x  —  cl  <5, 


'X 


g(t)  dt  -  g(c) 


1 


* X 


x  —  c 


(d(t)  -g(c))dt 


< 


< 


1 


'X 


(x  ~  C ) 

1 

(x  —  c ) 


I g(t)  -  g(c)\dt 


>x 


e  dt  =  e. 


□ 


Exercises 

Exercise  7.5.1.  (a)  Let  f(x)  =  \x\  and  define  F(x)  =  J-i  f'  Find  a  piece- 

wise  algebraic  formula  for  F(x)  for  all  x.  Where  is  F  continuous?  Where 
is  F  differentiable?  Where  does  F'(x)  =  f(x)7 


(b)  Repeat  part  (a)  for  the  function 


f(x) 


1  if  x  <  0 

2  if  x  >  0. 


Exercise  7.5.2.  Decide  whether  each  statement  is  true  or  false,  providing  a 
short  justification  for  each  conclusion. 

(a)  If  g  =  h'  for  some  h  on  [a,  6],  then  g  is  continuous  on  [a,  b\. 

(b)  If  g  is  continuous  on  [a,  6],  then  g  —  h!  for  some  h  on  [a,  b\. 

(c)  If  H(x)  =  f^h  is  differentiable  at  c  E  [a,  6],  then  h  is  continuous  at  c. 

Exercise  7.5.3.  The  hypothesis  in  Theorem  7.5.1  (i)  that  F' {x)  =  f(x)  for  all 
x  G  [a,  b]  is  slightly  stronger  than  it  needs  to  be.  Carefully  read  the  proof  and 
state  exactly  what  needs  to  be  assumed  with  regard  to  the  relationship  between 
/  and  F  for  the  proof  to  be  valid. 

Exercise  7.5.4.  Show  that  if  /  :  [a,  b\  R  is  continuous  and  /  =  0  for  all 
x  G  [a,  6],  then  f(x)  =  0  everywhere  on  [a,  b\.  Provide  an  example  to  show  that 
this  conclusion  does  not  follow  if  /  is  not  continuous. 
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Exercise  7.5.5.  The  Fundamental  Theorem  of  Calculus  can  be  used  to  supply 
a  shorter  argument  for  Theorem  6.3.1  under  the  additional  assumption  that  the 
sequence  of  derivatives  is  continuous. 

Assume  fn  —>■  f  pointwise  and  f'n  g  uniformly  on  [a,  b\.  Assuming  each 
f'n  is  continuous,  we  can  apply  Theorem  7.5.1  (i)  to  get 


fn{x)  ~  fn(a) 


for  all  x  G  [a,  b\.  Show  that  g{pc) 


Exercise  7.5.6  (Integration- by-parts).  (a)  Assume  h(pc)  and  k(pc)  have 
continuous  derivatives  on  [a,  b\  and  derive  the  familiar  integration-by-parts 
formula 


h(t)kf(t)dt  =  h(b)k(b)  —  h(a)k(a )  — 


(b)  Explain  how  the  result  in  Exercise  7.4.6  can  be  used  to  slightly  weaken 
the  hypothesis  in  part  (a). 

Exercise  7.5.7.  Use  part  (ii)  of  Theorem  7.5.1  to  construct  another  proof  of 
part  (i)  of  Theorem  7.5.1  under  the  stronger  hypothesis  that  /  is  continuous. 
(To  get  started,  set  G(x)  =  /.) 

Exercise  7.5.8  (Natural  Logarithm  and  Euler’s  Constant).  Let 


~dt 

t 


where  we  consider  only  x  >  0. 

(a)  What  is  L(l)?  Explain  why  L  is  differentiable  and  find  L\x). 

(b)  Show  that  L(xy)  =  L{x)-\-L{y).  (Think  of  y  as  a  constant  and  differentiate 
g(x)  =  L{xy).) 

(c)  Show  L[x/y )  =  L(x)  —  L(y). 

(d)  Let 

7"=(1  +  5  +  5  +  "V)“i(n)- 

Prove  that  (yn)  converges.  The  constant  7  =  lim7n  is  called  Euler’s 
constant. 

(e)  Show  how  consideration  of  the  sequence  72 n  ~  In  leads  to  the  interesting 
identity 


11111 
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Exercise  7.5.9.  Given  a  function  /  on  [a,  b],  define  the  total  variation  of  / 
to  be 


n 


F/  =  sup  <  ^  l/Ofc)  -  /Ofc-l) 


fc=l 


where  the  supremum  is  taken  over  all  partitions  P  of  [a,  b\. 

(a)  If  /  is  continuously  differentiable  ( / '  exists  as  a  continuous  function),  use 
the  Fundamental  Theorem  of  Calculus  to  show  V f  <  I  in 

(b)  Use  the  Mean  Value  Theorem  to  establish  the  reverse  inequality  and  con¬ 
clude  that  V f  =  fb  \f'\. 

Exercise  7.5.10  (Change-of-variable  Formula).  Let  g  :  [a,  b]  R  be  dif¬ 
ferentiable  and  assume  g'  is  continuous.  Let  /  :  [c,  d\  R  be  continuous,  and 
assume  that  the  range  of  g  is  contained  in  [c,  d]  so  that  the  composition  /  o  g  is 
properly  defined. 


(a)  Why  are  we  sure  /  is  the  derivative  of  some  function?  How  about  (fog)gf? 

(b)  Prove  the  change-of-variable  formula 


f(g(x))9'(x)dx 


Exercise  7.5.11.  Assume  /  is  integrable  on  [a,  b]  and  has  a  “jump  discontinu¬ 
ity”  at  c  G  (a,  b).  This  means  that  both  one-sided  limits  exist  as  x  approaches 
c  from  the  left  and  from  the  right,  but  that 


lim  f(x)  7^  lim  f(x). 

x^-c~  cc— )>C+ 

(This  phenomenon  is  discussed  in  more  detail  in  Section  4.6.) 

(a)  Show  that,  in  this  case,  F{x)  =  fff  is  not  differentiable  at  x  =  c. 

(b)  The  discussion  in  Section  5.5  mentions  the  existence  of  a  continuous  mono¬ 
tone  function  that  fails  to  be  differentiable  on  a  dense  subset  of  R.  Com¬ 
bine  the  results  of  part  (a)  with  Exercise  6.4.10  to  show  how  to  construct 
such  a  function. 


7.6  Lebesgue’s  Criterion  for  Riemann 
Integrability 

We  now  return  to  our  investigation  of  the  relationship  between  continuity  and 
the  Riemann  integral.  We  have  proved  that  continuous  functions  are  integrable 
and  that  the  integral  also  exists  for  functions  with  only  a  finite  number  of  discon¬ 
tinuities.  At  the  opposite  end  of  the  spectrum,  we  saw  that  Dirichlet’s  function, 
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which  is  discontinuous  at  every  point  on  [0, 1],  fails  to  be  Riemann-integrable. 
The  next  examples  show  that  the  set  of  discontinuities  of  an  integrable  func¬ 
tion  can  be  infinite  and  even  uncountable.  (These  also  appear  as  exercises  in 
Section  7.3.) 


Riemann-integrable  Functions  with  Infinite  Discontinuities 

Recall  from  Section  4.1  that  Thomae’s  function 

(  1  if  x  =  0 

t(x)  =  <  1/n  if  x  =  m/n  Q\{0}  is  in  lowest  terms  with  n  >  0 
[  0  if  x  Q 

is  continuous  on  the  set  of  irrationals  and  has  discontinuities  at  every  rational 
point.  Let’s  prove  that  Thomae’s  function  is  integrable  on  [0, 1]  with  t  =  0. 

Let  e  0.  The  strategy,  as  usual,  is  to  construct  a  partition  L}  of  [0,  lj  for 
which  U (£,  Pe)  —  L(t ,  Pe)  <  e. 

Exercise  7.6.1.  (a)  First,  argue  that  L(t,  P)  =  0  for  any  partition  P  of  [0, 1]. 

(b)  Consider  the  set  of  points  De/2  =  {x  :  t(pc)  >  e/2}.  How  big  is  De/2? 

(c)  To  complete  the  argument,  explain  how  to  construct  a  partition  Pe  of  [0, 1] 
so  that  U (£,  Pe)  <  e. 


We  first  met  the  Cantor  set  C  in  Section  3.1.  We  have  since  learned  that  C 
is  a  compact,  uncountable  subset  of  the  interval  [0, 1]. 

Exercise  7.6.2.  Define 


1  if  x  G  C 
0  if  x  C 


(a)  Show  h  has  discontinuities  at  each  point  of  C  and  is  continuous  at  every 
point  of  the  complement  of  C.  Thus,  h  is  not  continuous  on  an  uncount- 
ably  infinite  set. 

(b)  Now  prove  that  h  is  integrable  on  [0, 1]. 


Sets  of  Measure  Zero 

Thomae’s  function  fails  to  be  continuous  at  each  rational  number  in  [0,1]. 
Although  this  set  is  infinite,  we  have  seen  that  any  infinite  subset  of  Q  is  count¬ 
able.  Countably  infinite  sets  are  the  smallest  type  of  infinite  set.  The  Cantor 
set  is  uncountable,  but  it  is  also  small  in  a  sense  that  we  are  now  ready  to  make 
precise.  In  the  introduction  to  Chapter  3,  we  presented  an  argument  that  the 
Cantor  set  has  zero  “length.”  The  term  “length”  is  awkward  here  because  it 
really  should  only  be  applied  to  intervals  or  finite  unions  of  intervals,  which  the 
Cantor  set  is  not.  There  is  a  generalization  of  the  concept  of  length  to  more 
general  sets  called  the  measure  of  a  set.  Of  interest  to  our  discussion  are  subsets 
that  have  measure  zero. 


240 


Chapter  7.  The  Riemann  Integral 


Definition  7.6.1.  A  set  A  C  R  has  measure  zero  if,  for  all  e  >  0,  there  exists  a 
countable  collection  of  open  intervals  On  with  the  property  that  A  is  contained 
in  the  union  of  all  of  the  intervals  On  and  the  sum  of  the  lengths  of  all  of  the 
intervals  is  less  than  or  equal  to  e.  More  precisely,  if  \On\  refers  to  the  length  of 
the  interval  On,  then  we  have 


oo 

A  C  On  and 

n— 1 


£i° 

n= 1 


<  e. 


Example  7.6.2.  Consider  a  finite  set  A  =  {ai,a2, . . .  ,  a^v}.  To  show  that  A 
has  measure  zero,  let  e  >  0  be  arbitrary.  For  each  1  <  n  <  TV,  construct  the 
interval 

G  n  f  (1  f  t  —  — —  ,  (In  -  )  . 

V  27V’  2  NJ 

Clearly,  A  is  contained  in  the  union  of  these  intervals,  and 

N  N 

£  lG"l  =  £  jy  =  e' 

n=l  n=l 

Exercise  7.6.3.  Show  that  any  countable  set  has  measure  zero. 

Exercise  7.6.4.  Prove  that  the  Cantor  set  has  measure  zero. 

Exercise  7.6.5.  Show  that  if  two  sets  A  and  B  each  have  measure  zero,  then 
A  U  B  has  measure  zero  as  well.  In  addition,  discuss  the  proof  of  the  stronger 
statement  that  the  countable  union  of  sets  of  measure  zero  also  has  measure 
zero.  (This  second  statement  is  true,  but  a  completely  rigorous  proof  requires 
a  result  about  double  summations  discussed  in  Section  2.8.) 


a- Continuity 

Definition  7.6.3.  Let  /  be  defined  on  [a,  b],  and  let  a  >  0.  The  function  /  is 
a-continuous  at  x  E  [a,  b]  if  there  exists  S  >  0  such  that  for  all  y,  z  E  (x  —  S,  x+5) 
it  follows  that  \f(y)  —  f(z)  \  <  a. 

Let  /  be  a  bounded  function  on  [a,  b\.  For  each  a  >  0,  define  Da  to  be  the 
set  of  points  in  [a,  b ]  where  the  function  /  fails  to  be  a-continuous;  that  is, 


(i) 


Da  =  {xG  [a,  b]  :  f  is  not  a-continuous  at  x.} 


The  concept  of  a-continuity  was  previously  introduced  in  Section  4.6.  Several 
of  the  ensuing  exercises  appeared  as  exercises  in  this  section  as  well. 


Exercise  7.6.6.  If  a  <  a',  show  that  Da  C  D 


a 


Now,  let 


(2) 


D  =  {x  G  [a,  b\  :  /  is  not  continuous  at  x  }. 
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Exercise  7.6.7.  (a)  Let  a  >  0  be  given.  Show  that  if  /  is  continuous  at 

x  G  [a,  b],  then  it  is  a-continuous  at  x  as  well.  Explain  how  it  follows  that 
Da  C  D. 


(b)  Show  that  if  /  is  not  continuous  at  x,  then  /  is  not  a-continuous  for  some 
a  >  0.  Now,  explain  why  this  guarantees  that 

oo 

D=  U  D(Xn  where  an  =  1/n. 

n— 1 


Exercise  7.6.8.  Prove  that  for  a  fixed  a  >  0,  the  set  Da  is  closed. 


Just  as  with  continuity,  a-continuity  is  defined  pointwise,  and  just  as  with 
continuity,  uniformity  is  going  to  play  an  important  role. 

For  a  fixed  a  >  0,  a  function  f  :  A  R  is  uniformly  a- continuous  on  A 
if  there  exists  a  S  >  0  such  that  whenever  x  and  y  are  points  in  A  satisfying 
x  —  y |  <  S,  it  follows  that  \f(x)  —  f(y)\  <  ol.  By  imitating  the  proof  of 
Theorem  4.4.7,  it  is  completely  straightforward  to  show  that  if  /  is  a-continuous 
at  every  point  on  some  compact  set  K,  then  /  is  uniformly  a-continuous  on  K. 


Compactness  Revisited 

Compactness  of  subsets  of  the  real  line  can  be  described  in  three  equivalent 
ways.  The  following  theorem  appears  toward  the  end  of  Section  3.3. 

Theorem  7.6.4.  Let  K  C  Ft.  The  following  three  statements  are  all  equivalent, 
in  the  sense  that  if  any  one  is  true,  then  so  are  the  two  others. 

(i)  Every  sequence  contained  in  K  has  a  convergent  subsequence  that  con¬ 
verges  to  a  limit  in  K . 

(ii)  K  is  closed  and  bounded. 

(iii)  Given  a  collection  of  open  intervals  {G\  :  A  G  A}  that  covers  K  (that  is, 
K  C  UAgA  G\ )  there  exists  a  finite  subcollection  {G\1 ,  G \2 ,  G\3 , . . . ,  G\N  } 
of  the  original  set  that  also  covers  K . 

The  equivalence  of  (i)  and  (ii)  has  been  used  throughout  the  core  material 
in  the  text.  Characterization  (iii)  has  been  less  central  but  is  essential  to  the 
upcoming  argument.  If  the  characterization  of  compactness  in  terms  of  open 
covers  is  not  familiar,  take  a  moment  to  review  the  second  half  of  Section  3.3 
and  complete  the  proof  that  (i)  and  (ii)  imply  (iii)  outlined  in  Exercise  3.3.9. 

Lebesgue’s  Theorem 

We  are  now  prepared  to  completely  categorize  the  collection  of  Riemann- 
integrable  functions  in  terms  of  continuity. 
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Theorem  7.6.5  (Lebesgue’s  Theorem).  Let  f  be  a  bounded  function  defined 
on  the  interval  [a,  b] .  Then,  f  is  Riemann-integrable  if  and  only  if  the  set  of 
points  where  f  is  not  continuous  has  measure  zero. 


Proof  Let  M  >  0  satisfy  \f(x)\  <  M  for  all  x  E  [a,  6],  and  let  D  and  Da  be 
defined  as  in  the  preceding  equations  (1)  and  (2).  Let’s  first  assume  that  D  has 
measure  zero  and  prove  that  our  function  is  integrable. 

(<=)  Let  e  >  0  and  set 


a  = 


2(6  —  a) 


Exercise  7.6.9.  Show  that  there  exists  a  finite  collection  of  disjoint  open  in¬ 
tervals  {G\,  G*2,  •  •  • ,  Gn}  whose  union  contains  Da  and  that  satisfies 


N 

i^ni 

n— 1 


e 

4 M' 


,ai  ^]\  Un=i  Gn-  Argue  that  /  is 


Exercise  7.6.10.  Let  K  be  what  remains  of  the  interval  [a,  b\  after  the  open 
intervals  Gn  are  all  removed;  that  is,  K  - 
uniformly  a-continuous  on  K. 


Exercise  7.6.11.  Finish  the  proof  in  this  direction  by  explaining  how  to  con¬ 
struct  a  partition  Pe  of  [a,  b]  such  that  U (/,  P€)  —  L(f ,  P€)  <  e.  It  will  be  helpful 
to  break  the  sum 


n 

U(f ,  Pe)  -  L(f,  Pe)  =  ]T(Mfc  -  mk) Axk 

k= 1 

into  two  parts — one  over  those  subintervals  that  contain  points  of  Da  and  the 
other  over  subintervals  that  do  not. 


(=>)  For  the  other  direction,  assume  /  is  Riemann-integrable.  We  must  argue 
that  the  set  D  of  discontinuities  of  /  has  measure  zero. 

Let  e  >  0  be  arbitrary,  and  fix  ex  0.  Because  f  is  Riemann-integrable, 
there  exists  a  partition  Pe  of  [a,  b]  such  that  U (/,  P€)  —  L(/,  Pe)  <  ae. 


Exercise  7.6.12.  (a)  Prove  that  Da  has  measure  zero.  Point  out  that  it  is 

possible  to  choose  a  cover  for  Da  that  consists  of  a  finite  number  of  open 
intervals. 

(b)  Show  how  this  implies  that  D  has  measure  zero.  ^ 

Our  main  agenda  in  the  remainder  of  this  section  is  to  employ  Lebesgue’s 
Theorem  in  our  pursuit  of  a  non-integrable  derivative,  but  this  elegant  result 
has  a  number  of  other  applications. 


Exercise  7.6.13.  (a)  Show  that  if  /  and  g  are  integrable  on  [a,  6],  then  so  is 

the  product  fg.  (This  result  was  requested  in  Exercise  7.4.6,  but  notice 
how  much  easier  the  argument  is  now.) 
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(b)  Show  that  if  g  is  integr  able  on  [a,  b }  and  /  is  continuous  on  the  range  of 
g ,  then  the  composition  /  o  g  is  integrable  on  [a,  b\. 


If  we  instead  assume  that  /  is  integrable  and  g  is  continuous,  it  actually 
doesn’t  follow  that  the  composition  /  o  g  is  an  integrable  function.  Producing  a 
counterexample,  however,  requires  a  few  more  ingredients. 


A  Nonintegrable  Derivative 


To  this  point,  our  one  example  of  a  nonintegrable  function  is  Dirichlet’s  nowhere- 
continuous  function.  We  close  this  section  with  another  example  that  has  special 
significance.  The  content  of  the  Fundamental  Theorem  of  Calculus  is  that  inte¬ 
gration  and  differentiation  are  inverse  processes  of  each  other.  If  a  function  /  is 
differentiable  on  [a,  b],  then  part  (i)  of  the  Fundamental  Theorem  tells  us  that 


/O)  -  /(«) , 


provided  f  is  integrable.  But  shouldn’t  /'  be  integrable  just  by  virtue  of  being 
a  derivative?  A  curious  side-effect  of  staring  at  equation  (3)  for  any  length  of 
time  is  that  it  starts  to  feel  as  though  every  derivative  should  be  integrable 
because  we  have  an  obvious  candidate  for  what  the  value  of  the  integral  ought 
to  be.  Alas,  for  the  Riemann  integral  at  least,  reality  comes  up  short  of  our 
expectations.  What  follows  is  the  construction  of  a  differentiable  function  /  for 
which  equation  (3)  fails  because  J ^  f  does  not  exist. 

We  will  once  again  be  interested  in  the  Cantor  set 


c = n  cn, 

n= 0 


defined  in  Section  3.1.  As  an  initial  step,  let’s  create  a  function  f(x)  that  is 
differentiable  on  [0, 1]  and  whose  derivative  f{x)  has  discontinuities  at  every 
point  of  C.  The  key  ingredient  for  this  construction  is  the  function 

s  \  x2sin(l/x)  if  x  >  0 
9^x’  ~  \  0  if  X  <  0. 


Exercise  7.6.14.  (a)  Find  g\ 0). 

(b)  Use  the  standard  rules  of  differentiation  to  compute  g'(x)  for  r/0. 

(c)  Explain  why,  for  every  S  >  0,  g'(x)  attains  every  value  between  1  and  —  1 
as  x  ranges  over  the  set  (—(5,(5).  Conclude  that  g'  is  not  continuous  at 
x  =  0. 


Now,  we  want  to  transport  the  behavior  of  g  around  zero  to  each  of  the  end¬ 
points  of  the  closed  intervals  that  make  up  the  sets  Cn  used  in  the  definition  of 
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the  Cantor  set.  The  formulas  are  awkward  but  the  basic  idea  is  straightforward. 
Start  by  setting 

fo(x)  =  0  on  Co  =  [0,1]. 

To  define  fi  on  [0, 1],  first  assign 


fi  {%)  =  0  for  all  x  e  Ci 


In  the  remaining  open  middle  third,  put  translated  “copies”  of  g  oscillating 
toward  the  two  endpoints  (Fig.  7.3).  In  terms  of  a  formula,  we  have 

'  0  if  x  G  [0,1/3] 

g(pc  —  1/3)  if  x  is  just  to  the  right  of 

g(— x  +  2/3)  if  x  is  just  to  the  left  of  2/3 

0  if  XG  [2/3,1]. 

Finally,  we  splice  the  two  oscillating  pieces  of  fi  together  in  a  way  that  makes 
fi  differentiable  and  such  that 


fi(x)\  <  (x  —  1/3)2  and  \fi(x)\  <  (— x  +  2/3) 


This  splicing  is  no  great  feat,  and  we  will  skip  the  details  so  as  to  keep  our 
attention  focused  on  the  two  endpoints  1/3  and  2/3.  These  are  the  points 
where  f[(x)  fails  to  be  continuous. 

To  define  ^(x),  we  start  with  fi(x)  and  do  the  same  trick  as  before,  this 
time  in  the  two  open  intervals  (1/9,  2/9)  and  (7/9,  8/9).  The  result  (Fig.  7.4) 
is  a  differentiable  function  that  is  zero  on  C2  and  has  a  derivative  that  is  not 
continuous  on  the  set 

[1  2  1  2  7  8] 

\9’ 9’ 3’ 3’ 9’ 9  J 


Continuing  in  this  fashion  yields  a  sequence  of  functions  /o,  /1,  /2,  •  •  •  defined 
on  [0, 1]. 
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Exercise  7.6.15.  (a)  If  c  G  C,  what  is  linin^oo  /n(c)? 

(b)  Why  does  liiUn^oo  /n(x)  exist  for  x  ^  Cl 
Now,  set 

/(re)  =  lim  fn(x). 

n— )>oo 


Exercise  7.6.16.  (a)  Explain  why  f'(x)  exists  for  all  x  ^  C. 

(b)  if  c  G  C,  argue  that  |/(x)|  <  (x  —  c )2  for  all  x  G  [0, 1].  Show  how  this 
implies  /'(c)  =  0. 

(c)  Give  a  careful  argument  for  why  /'(x)  fails  to  be  continuous  on  C.  Re¬ 
member  that  C  contains  many  points  besides  the  endpoints  of  the  intervals 
that  make  up  Ci,  C2,  C3, . . . . 


Let’s  take  inventory  of  the  situation.  Our  goal  is  to  create  a  nonintegrable 
derivative.  Our  function  f(x)  is  differentiable,  and  f  fails  to  be  continuous  on 
C .  We  are  not  quite  done. 

Exercise  7.6.17.  Why  is  f  (x)  Riemann-integrable  on  [0, 1]? 

The  reason  the  Cantor  set  has  measure  zero  is  that,  at  each  stage,  2n_1  open 
intervals  of  length  l/3n  are  removed  from  Cn-\.  The  resulting  sum 


00  /  1 

n= 1  x 

converges  to  one,  which  means  that  the  approximating  sets  Ci,  C2,  C3, . . .  have 
total  lengths  tending  to  zero.  Instead  of  removing  open  intervals  of  length  l/3n 
at  each  stage,  let’s  see  what  happens  when  we  remove  intervals  of  length  l/3n+1. 

Exercise  7.6.18.  Show  that,  under  these  circumstances,  the  sum  of  the  lengths 
of  the  intervals  making  up  each  Cn  no  longer  tends  to  zero  as  n  00.  What  is 
this  limit? 
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Figure  7.5:  A  differentiable  function  with  a  non-integrable 

DERIVATIVE. 

If  we  again  take  the  intersection  Pl^Lo  the  result  is  a  Cantor- type  set  with 
the  same  topological  properties — it  is  closed,  compact,  perfect,  and  contains 
no  intervals.  But  a  consequence  of  the  previous  exercise  is  that  it  no  longer 
has  measure  zero.  This  is  just  what  we  need  to  define  our  desired  function. 
By  repeating  the  preceding  construction  of  f(x)  on  this  new  Cantor- type  set 
of  strictly  positive  measure,  we  get  a  differentiable  function  whose  derivative 
has  too  many  points  of  discontinuity  (Fig.  7.5).  By  Lebesgue’s  Theorem,  this 
derivative  cannot  be  integrated  using  the  Riemann  integral. 

Exercise  7.6.19.  As  a  final  gesture,  provide  the  example  advertised  in  Exer¬ 
cise  7.6.13  of  an  integrable  function  /  and  a  continuous  function  g  where  the 
composition  fog  is  properly  defined  but  not  integrable.  Exercise  4.3.12  may 
be  useful. 


7.7  Epilogue 

Riemann’s  definition  of  the  integral  was  a  modification  of  Cauchy’s  integral, 
which  was  originally  designed  for  the  purpose  of  integrating  continuous  func¬ 
tions.  In  this  goal,  the  Riemann  integral  was  a  complete  success.  For  continuous 
functions  at  least,  the  process  of  integration  now  stood  on  its  own  rigorous  foot¬ 
ing,  defined  independently  of  differentiation.  As  analysis  progressed,  however, 
the  dependence  of  integrability  on  continuity  became  problematic.  The  last 
example  of  Section  7.6  highlights  one  type  of  weakness:  not  every  derivative 
can  be  integrated.  Another  limitation  of  the  Riemann  integral  arises  in  asso¬ 
ciation  with  limits  of  sequences  of  functions.  To  get  a  sense  of  this,  let’s  once 
again  consider  Dirichlet’s  function  g(x)  introduced  in  Section  4.1.  Recall  that 
g(x)  =  1  whenever  x  is  rational,  and  g{x)  =0  at  every  irrational  point.  Focusing 
on  the  interval  [0, 1]  for  a  moment,  let 


{ri,r2,r3,r4 . . .} 
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be  an  enumeration  of  the  countable  number  of  rational  points  in  this  interval. 
Now,  let  gi(x)  =  1  if  x  =  rq  and  define  gi(x)  =  0  otherwise.  Next,  define 
g2(x)  =  1  if  x  is  either  rq  or  7*2,  and  let  g^^x)  —  0  at  all  other  points.  In  general, 
for  each  n  E  N,  define 


f  1  if  x  e  {ri,r2,  •  •  •  ,r„} 
[  0  otherwise. 


Notice  that  each  gn  has  only  a  finite  number  of  discontinuities  and  so  is  Riemann- 
integrable  with  f0  gn  =  0.  But  we  also  have  gn  g  pointwise  on  the 
interval  [0, 1].  The  problem  arises  when  we  remember  that  Dirichlet’s  nowhere- 
continuous  function  is  not  Rienrann-integrable.  Thus,  the  equation 

lim  gn  = 

n^oo  J 0 


fails  to  hold,  not  because  the  values  on  each  side  of  the  equal  sign  are  different 
but  because  the  value  on  the  right-hand  side  does  not  exist.  The  content  of  The¬ 
orem  7.4.4  is  that  this  equation  does  hold  whenever  we  have  gn  g  uniformly. 
This  is  a  reasonable  way  to  resolve  the  situation,  but  it  is  a  bit  unsatisfying 
because  the  deficiency  in  this  case  is  not  entirely  with  the  type  of  convergence 
but  lies  in  the  strength  of  the  Riemann  integral.  If  we  could  make  sense  of  the 
right-hand  side  via  some  other  definition  of  integration,  then  maybe  equation 
(1)  would  actually  be  true. 

Such  a  definition  was  introduced  by  Henri  Lebesque  in  1901.  Generally 
speaking,  Lebesgue’s  integral  is  constructed  using  a  generalization  of  length 
called  the  measure  of  a  set.  In  the  previous  section,  we  studied  sets  of  measure 
zero.  In  particular,  we  showed  that  the  rational  numbers  in  [0,1]  (because  they 
are  countable)  have  measure  zero.  The  irrational  numbers  in  [0,1]  have  measure 
one.  This  should  not  be  too  surprising  because  we  now  have  that  the  measures 
of  these  two  disjoint  sets  add  up  to  the  length  of  the  interval  [0,1].  Rather 
than  chopping  up  the  x-axis  to  approximate  the  area  under  the  curve,  Lebesgue 
suggested  partitioning  the  y- axis.  In  the  case  of  Dirichlet’s  function  g ,  there 
are  only  two  range  values — zero  and  one.  The  integral,  according  to  Lebesgue, 
could  be  defined  via 


1  •  [measure  of  set  where  g  =  1]  +  0  •  [measure  of  set  where  g  =  0] 


1  -0  +  0-  1  =  0. 


With  this  interpretation  of  fQ  g ,  equation  (1)  is  now  valid! 

The  Lebesgue  integral  is  presently  the  standard  integral  in  advanced  math¬ 
ematics.  The  theory  is  taught  to  all  graduate  students,  as  well  as  to  many 
undergraduates,  and  it  is  the  integral  used  in  most  research  papers  where  inte¬ 
gration  is  required.  The  Lebesgue  integral  generalizes  the  Riemann  integral  in 
the  sense  that  any  function  that  is  Riemann-integrable  is  Lebesgue-integrable 
and  integrates  to  the  same  value.  The  real  strength  of  the  Lebesgue  integral 
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is  that  the  class  of  integrable  functions  is  much  larger.  Most  importantly,  this 
class  includes  the  limits  of  different  types  of  Cauchy  sequences  of  integrable 
functions.  This  leads  to  a  group  of  extremely  important  convergence  theorems 
related  to  equation  (1)  with  hypotheses  much  weaker  than  the  uniform  conver¬ 
gence  assumed  in  Theorem  7.4.4. 

Despite  its  prevalence,  the  Lebesgue  integral  does  have  a  few  drawbacks. 
There  are  functions  whose  improper  Riemann  integrals  exist  but  that  are  not 
Lebesgue-integrable.  Another  disappointment  arises  from  the  relationship  be¬ 
tween  integration  and  differentiation.  Even  with  the  Lebesgue  integral,  it  is  still 
not  possible  to  prove 

[b  r  =  m  -  /(«) 

J  a 

without  some  additional  assumptions  on  /.  Around  1960,  a  new  integral  was 
proposed  that  can  integrate  a  larger  class  of  functions  than  either  the  Riemann 
integral  or  the  Lebesgue  integral  and  suffers  from  neither  of  the  preceding 
weaknesses.  Remarkably,  this  integral  is  actually  a  return  to  Riemann’s  orig¬ 
inal  technique  for  defining  integration,  with  some  small  modifications  in  how 
we  describe  the  “fineness”  of  the  partitions.  An  introduction  to  the  generalized 
Riemann  integral  is  the  topic  of  Section  8.1. 


Chapter  8 

Additional  Topics 


The  foundation  in  analysis  provided  by  the  first  seven  chapters  is  sufficient 
background  for  the  exploration  of  some  advanced  and  historically  important 
topics.  The  writing  in  this  chapter  is  similar  to  that  in  the  concluding  project 
sections  of  each  individual  chapter.  Exercises  are  included  within  the  exposition 
and  are  designed  to  make  each  section  a  narrative  investigation  into  a  significant 
achievement  in  the  field  of  analysis. 


8.1  The  Generalized  Riemann  Integral 


Chapter  7  concluded  with  Henri  Lebesgue’s  elegant  result  that  a  bounded  func¬ 
tion  is  Riemann-integrable  if  and  only  if  its  points  of  discontinuity  form  a  set 
of  measure  zero.  To  eliminate  the  dependence  of  integrability  on  continuity, 
Lebesgue  proposed  a  new  method  of  integration  that  has  become  the  standard 
integral  in  mathematics.  In  the  Epilogue  to  Chapter  7,  we  briefly  outlined  some 
of  the  strengths  and  weaknesses  of  the  Lebesgue  integral,  concluding  with  a  look 
back  to  the  Fundamental  Theorem  of  Calculus  (Theorem  7.5.1).  (Lebesgue’s 
measure-zero  criterion  is  not  a  prerequisite  for  understanding  the  material  in 
this  section,  but  the  discussion  in  Section  7.7  provides  some  useful  context  for 
what  follows.) 

If  F  is  a  differentiable  function  on  [a,  6],  then  in  a  perfect  world  we  might 
hope  to  prove  that 


(i) 


F'  =  F(b)  -  F(a) 


Notice  that  although  this  is  the  conclusion  of  part  (i)  of  Theorem  7.5.1,  there 
we  needed  the  additional  requirement  that  F '  be  Riemann-integrable.  To  drive 
this  point  home,  Section  7.6  concluded  with  an  example  of  a  function  that  has 
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a  derivative  that  the  Riemann  integral  cannot  handle.  The  Lebesgue  integral 
alluded  to  earlier  is  a  significant  improvement.  It  can  integrate  our  example 
from  Section  7.6,  but  ultimately  it  too  suffers  from  the  same  setback.  Not  every 
derivative  is  integrable,  no  matter  which  integral  is  used. 

What  follows  is  a  short  introduction  to  the  generalized  Riemann  integral,  dis¬ 
covered  independently  around  1960  by  Jaroslav  Kurzweil  and  Ralph  Henstock. 
As  mentioned  in  Section  7.7,  this  lesser-known  modification  of  the  Riemann 
integral  can  actually  integrate  a  larger  class  of  functions  than  Lebesgue’s  ubiq¬ 
uitous  integral  and  yields  a  surprisingly  simple  proof  of  equation  (1)  above  with 
no  additional  hypotheses. 


The  Riemann  Integral  as  a  Limit 


Let 


P  =  {x0,xi,x2, 


X 


n 


} 


be  a  partition  of  [a,  b\.  A  tagged  partition  is  one  where  in  addition  to  P  we  have 
chosen  points  Ck  in  each  of  the  subintervals  [xk-i,Xk\-  This  sets  the  stage  for 
the  concept  of  a  Riemann  sum.  Given  a  function  /  :  [a,  b }  R,  and  a  tagged 
partition  (P,  {cfc}jj=1),  the  Riemann  sum  generated  by  this  partition  is  given  by 


R(f,  P)  =  E  f{ck)(Xk  -  Xk-l). 

k= 1 

Looking  back  at  the  definition  of  the  upper  sum 


n 

u if,  P)  =  E  Mk(xk  -  Xk-i)  where  Mk  =  sup{/(V)  :  x  G  [xk-i,  xk]}, 

k= 1 


and  the  lower  sum 

n 

L(f,  -P)  =  E  mk(xk  -  Xk-l)  where  mk  =  inf{/(x)  :  x  G  [xk-i,xk]}, 

k= 1 


it  should  be  clear  that 


L(f,P)<R(f,P)<U(f,P) 

for  any  bounded  function  /.  In  Definition  7.2.7,  we  characterized  integrability 
by  insisting  that  the  infimum  of  the  upper  sums  equal  the  supremum  of  the 
lower  sums.  Any  Riemann  sum  is  going  to  fall  between  a  particular  upper  and 
lower  sum.  If  the  upper  and  lower  sums  are  converging  to  some  common  value, 
then  the  Riemann  sums  are  also  eventually  close  to  this  value  as  well.  The  next 
theorem  shows  that  it  is  possible  to  characterize  Riemann  integrability  in  a  way 
equivalent  to  Definition  7.2.7  using  an  e-5-type  definition  applied  to  Riemann 


sums. 
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Definition  8.1.1.  Let  8  >  0.  A  partition  P  is  8 -fine  if  every  subinterval 
Xk-i,Xk\  satisfies  Xk  —  Xk-i  <  8.  In  other  words,  every  subinterval  has  width 
less  than  8. 


Theorem  8.1.2  (Limit  Criterion  for  Riemann  Integrability).  A  bounded 
function  f  :  [a,  b\  R  is  Riemann-integrable  with 


b 

f  =  A 

if  and  only  if,  for  every  e  >  0,  there  exists  a  8  >  0  such  that,  for  any  tagged 
partition  ( P,{ck })  that  is  8-fine,  it  follows  that 


R(f,P)-A 


<  e. 


Before  attempting  the  proof,  we  should  point  out  that,  in  some  treatments, 
the  criterion  in  Theorem  8.1.2  is  actually  taken  as  the  definition  of  Riemann  inte- 
gr ability.  In  fact,  this  is  how  Riemann  originally  defined  the  concept.  The  spirit 
of  this  theorem  is  close  to  what  is  taught  in  most  introductory  calculus  courses. 
To  approximate  the  area  under  the  curve,  Riemann  sums  are  constructed.  The 
hope  is  that  as  the  partitions  become  finer,  the  corresponding  approximations 
get  closer  to  the  value  of  the  integral.  The  content  of  Theorem  8.1.2  is  that 
if  the  function  is  integrable,  then  these  approximations  do  indeed  converge  to 
the  value  of  the  integral,  regardless  of  how  the  tags  are  chosen.  Conversely,  if 
the  approximating  Riemann  sums  for  finer  and  finer  partitions  collect  around 
some  value  A ,  then  the  function  is  integrable  and  integrates  to  A. 

Proof.  (=>)  For  the  forward  direction,  we  begin  with  the  assumption  that  /  is 
integrable  on  [a,  b\.  Given  an  e  >  0,  we  must  produce  a  8  >  0  such  that  if 

(Pi  {°k  is  any  tagged  partition  that  is  (5-fine,  then  | R(^f,  P)  J '  f  |  <C  e. 

Because  /  is  integrable,  we  know  there  exists  a  partition  Pe  such  that 

P(/,Pe)-P(/,Pe)<^. 


Let  M  >  0  be  a  bound  on  |  / 1 ,  and  let  n  be  the  number  of  subintervals  of  Pe  (so 
that  Pe  really  consists  of  n  -j-  1  points  in  [a,  b}).  We  will  argue  that  choosing 


8  =  e/9nM 


has  the  desired  property. 

Here  is  the  idea.  Let  (P,  { c &})  be  an  arbitrary  tagged  partition  of  [a,  b]  that 
is  (5-fine,  and  let  P'  =  P  U  Pe.  The  key  is  to  establish  the  string  of  inequalities 

L(f,  P')-{<  £(/,  P)  <  U(f,  P)  <  U(f,  P')  + 

Exercise  8.1.1.  (a)  Explain  why  both  the  Riemann  sum  P(/,  P)  and  fb  f 

fall  between  L(/,  P)  and  U (/,  P). 


252 


Chapter  8.  Additional  Topics 


(b)  Explain  why  U (/,  P ')  —  L(f ,  P')  <  e/3. 

By  the  previous  exercise,  if  we  can  show  U (/,  P)  <  U(f,  Pr)  +  e/3  (and 
similarly  L(f,  P')  —  e/3  <  L(/,  P)),  then  it  will  follow  that 


R(f,P) 


<  e 


and  the  proof  will  be  done.  Thus,  we  turn  our  attention  toward  estimating  the 
distance  between  [/(/,  P)  and  [/(/,  P'). 

Exercise  8.1.2.  Explain  why  [/(/,  P)  —  U(f,P')  >  0. 


A  typical  term  in  either  [/(/,  P)  or  U(f,P')  has  the  form  M^(xk  —  Xk-i), 
where  Mk  is  the  supremum  of  /  over  [xk-i,Xk\-  A  good  number  of  these  terms 
appear  in  both  upper  sums  and  so  cancel  out. 


Exercise  8.1.3.  (a)  In  terms  of  n,  what  is  the  largest  number  of  terms  of  the 

form  Mk{xk  —  Xk-i)  that  could  appear  in  one  of  [/(/,  P)  or  U(f ,  P')  but 
not  the  other? 


(b)  Finish  the  proof  in  this  direction  by  arguing  that 

U(f,P)  —  U(f,P')  <  e/3. 

(<=)  For  this  direction,  we  assume  that  the  e-S  criterion  in  Theorem  8.1.2 
holds  and  argue  that  /  is  integrable.  Integr ability,  as  we  have  defined  it,  depends 
on  our  ability  to  choose  partitions  for  which  the  upper  sums  are  close  to  the 
lower  sums.  We  have  remarked  that  given  any  partition  P,  it  is  always  the  case 
that 

L(f,P)  <  R(f,P)  <  U(f,P) 

no  matter  which  tags  are  chosen  to  compute  P(/,  P). 

Exercise  8.1.4.  (a)  Show  that  if  /  is  continuous,  then  it  is  possible  to  pick 

tags  {ci c}^=1  so  that 

R(f,P)  =  U(f,P). 

Similarly,  there  are  tags  for  which  P(/,  P)  =  L(/,  P)  as  well. 

(b)  If  /  is  not  continuous,  it  may  not  be  possible  to  find  tags  for  which 
P(/,  P)  =  [/(/,  P).  Show,  however,  that  given  an  arbitrary  e  >  0,  it 
is  possible  to  pick  tags  for  P  so  that 

U(f,P)  —  R(f,P)  <  e. 

The  analogous  statement  holds  for  lower  sums. 


Exercise  8.1.5.  Use  the  results  of  the  previous  exercise  to  finish  the  proof  of 
Theorem  8.1.2.  □ 
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Gauges  and  5(rc)-fine  Partitions 

The  key  to  the  generalized  Riemann  integral  is  to  allow  the  8  in  Theorem  8.1.2 
to  be  a  function  of  x. 


Definition  8.1.3.  A  function  8  :  [a,  b]  R  is  called  a  gauge  on  [a,  b\  if  S(x)  >  0 
for  all  x  G  [a,  b\. 


Definition  8.1.4.  Given  a  particular  gauge  S(x),  a  tagged  partition  (P,  {c/e}^=1) 
is  8  (x) -fine  if  every  subinterval  [xk—i,Xk]  satisfies  Xk  —  Xk-i  <  8(cj. c).  In  other 
words,  each  subinterval  [xk-i,Xk\  has  width  less  than  8(ck ). 


It  is  important  to  see  that  if  8(x)  is  a  constant  function,  then  Definition  8.1.4 
says  precisely  the  same  thing  as  Definition  8.1.1.  In  the  case  where  8(x)  is  not  a 
constant,  Definition  8.1.4  describes  a  way  of  measuring  the  fineness  of  partitions 
that  is  quite  different. 


Exercise  8.1.6.  Consider  the  interval  [0,1]. 

(a)  If  8(x)  =  1/9,  find  a  5(x)-fine  tagged  partition  of  [0, 1].  Does  the  choice 
of  tags  matter  in  this  case? 

(b)  Let 


c/  \  f  1/4  if  x 

^  )  {  x/3  if  0 


=  0 

<  X  <  1 


Construct  a  £(x)-fine  tagged  partition  of  [0,1]. 


The  tinkering  required  in  Exercise  8.1.6  (b)  may  cast  doubt  on  whether 
an  arbitrary  gauge  always  admits  a  (5(x)-fine  partition.  However,  it  is  not  too 
difficult  to  show  that  this  is  indeed  the  case. 


Theorem  8.1.5.  Given  a  gauge  8(x)  on  an  interval  [a,  b\,  there  exists  a  tagged 
partition  (P,  {cfc}^=1)  that  is  8 (x) -fine. 

Proof  Let  Iq  =  [a,  b\.  It  may  be  possible  to  find  a  tag  such  that  the  trivial 
partition  P  =  {a,  b}  works.  Specifically,  if  b  —  a  <  8(x)  for  some  x  G  [a,  6],  then 
we  can  set  c\  equal  to  such  an  x  and  notice  that  (P,  {ci})  is  (5(x)-fine.  If  no 
such  x  exists,  then  bisect  [a,  b]  into  two  equal  halves. 


Exercise  8.1.7.  Finish  the  proof  of  Theorem  8.1.5. 


□ 


Generalized  Riemann  Integrability 

Keeping  in  mind  that  Theorem  8.1.2  offers  an  equivalent  way  to  define  Riemann 
integrability,  we  now  propose  a  new  method  for  defining  the  value  of  the  integral. 
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Definition  8.1.6.  A  function  /  on  [a,  b]  has  generalized  Riemann  integral  A 
if,  for  every  e  >  o,  there  exists  a  gauge  S(x)  on  [a,  b]  such  that  for  each  tagged 
partition  (P,  {c/e}^=1)  that  is  5(#)-fine,  it  is  true  that 


R(f,P)-A 


<  e. 


In  this  case,  we  write  A  =  f . 

Theorem  8.1.7.  If  a  function  has  a  generalized  Riemann  integral,  then  the 
value  of  the  integral  is  unique. 

Proof.  Assume  that  a  function  /  has  generalized  Riemann  integral  A\  and  that 
it  also  has  generalized  Riemann  integral  A 2.  We  must  prove  A\  =  A 2. 

Exercise  8.1.8.  Finish  the  argument.  ^ 

The  implications  of  Definition  8.1.6  on  the  resulting  class  of  integrable  func¬ 
tions  are  far  reaching.  This  is  somewhat  surprising  given  that  the  criteria  for 
integrability  in  Definition  8.1.6  and  Theorem  8.1.2  differ  in  such  a  small  way. 
One  observation  that  should  be  immediately  evident  is  the  following. 


Exercise  8.1.9.  Explain  why  every  function  that  is  Riemann- integrable  with 
f^f  =  A  must  also  have  generalized  Riemann  integral  A. 


The  converse  statement  is  not  true,  and  that  is  the  important  point.  One 
example  that  we  have  of  a  non-Riemann-integrable  function  is  Dirichlet’s  func¬ 
tion 


f  1  if  x  G  Q 
\  0  if  x  £  Q 


which  has  discontinuities  at  every  point  of  R. 


Theorem  8.1.8.  Dirichlet’s  function  g(pc)  is  generalized  Riemann-integrable  on 
[0, 1]  with  Jq  g  =  0. 

Proof.  Let  e  >  0.  By  Definition  8.1.6,  we  must  construct  a  gauge  S(x)  on  [0, 1] 
such  that  whenever  (P,  {c/c}^=1)  is  a  5(x)-fine  tagged  partition,  it  follows  that 


n 

0  <  ^2  g(ck)(xk  -  xk- 1)  <  e. 

k= 1 

The  gauge  represents  a  restriction  on  the  size  of  Axk  =  Xk  —  Xk-i  in  the  sense 
that  Axk  <  5(ck).  The  Riemann  sum  consists  of  products  of  the  form  g(ck) 
Thus,  for  irrational  tags,  there  is  nothing  to  worry  about  because  g(ck)  =  0  in 
this  case.  Our  task  is  to  make  sure  that  any  time  a  tag  Ck  is  rational,  it  comes 
from  a  suitably  thin  subinterval. 

Let  {ri,  7*2, 7*3, . . .}  be  an  enumeration  of  the  countable  set  of  rational  num¬ 
bers  contained  in  [0,1].  For  each  r^,  set  S(rk)  =  e/2fc+1.  For  x  irrational,  set 
S(x)  =  1. 
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Exercise  8.1.10.  Show  that  if  (P,  {ck}f:=1)  is  a  £(x)-fme  tagged  partition,  then 
R(g ,  P)  <  e.  □ 

Dirichlet’s  function  fails  to  be  Riemann-integrable  because,  given  any  (un¬ 
tagged)  partition,  it  is  possible  to  make  R(g ,  P)  =  1  or  R(g ,  P)  =  0  by  choosing 
the  tags  to  be  either  all  rational  or  all  irrational.  For  the  generalized  Rie¬ 
mann  integral,  choosing  all  rational  tags  results  in  a  tagged  partition  that  is 
not  S(x)- fine  (when  S(x)  is  small  on  rational  points)  and  so  does  not  have  to  be 
considered.  In  general,  allowing  for  nonconstant  gauges  allows  us  to  be  more 
discriminating  about  which  tagged  partitions  qualify  as  (5(x)-fine.  The  result, 
as  we  have  just  seen,  is  that  it  may  be  easier  to  achieve  the  inequality 


\R(f,P)-A\<e 


for  the  often  smaller  and  more  carefully  selected  set  of  tagged  partitions  that 
remain. 


The  Fundamental  Theorem  of  Calculus 

We  conclude  this  brief  introduction  to  the  generalized  Riemann  integral  with  a 
proof  of  the  Fundamental  Theorem  of  Calculus.  As  was  alluded  to  earlier,  the 
most  notable  distinction  between  the  following  theorem  and  part  (i)  of  Theorem 
7.5.1  is  that  here  we  do  not  need  to  assume  that  the  derivative  function  is  inte¬ 
grate.  Using  the  generalized  Riemann  integral,  every  derivative  is  integrable, 
and  the  integral  can  be  evaluated  using  the  antiderivative  in  the  familiar  way. 
It  is  also  interesting  to  note  that  in  Theorem  7.5.1  the  Mean  Value  Theorem 
played  the  crucial  role  in  the  argument,  but  it  is  not  needed  here. 


Theorem  8.1.9.  Assume  F  :  [a,  b]  R  is  differentiable  at  each  point  in  [a,  b\ 
and  set  f(x)  =  F'(x).  Then,  f  has  the  generalized  Riemann  integral 


F(b)  —  F(a). 


Proof.  Let  P  =  {xo,  aq,  aq,  •  •  • ,  xn}  be  a  partition  of  [a,  b\.  Both  this  proof  and 
the  proof  of  Theorem  7.5.1  make  use  of  the  following  fact. 


Exercise  8.1.11.  Show  that 


F(b)  -  F(a )  =  ^  [F(xk)  -  F(xkm 

k= 1 
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If  {c/c}^=1  is  a  set  of  tags  for  P,  then  we  can  estimate  the  difference  between 
the  Riemann  sum  P(/,  P)  and  F(b)  —  F(a)  by 


|  F(b)-F(a)-R(f,P) 


n 


T.  [F(xk)  -  F(xk- 1)  -  f(ck)(xk  -  xk-i) 


k= i 
n 


<  y  | F(xk)  -  F(xk-i)  -  f(ck)(xk  ~  xk-i) 


k= l 


Let  e  >  0.  To  prove  the  theorem,  we  must  construct  a  gauge  5(c)  such  that 


(2) 


\F(b)  —  F(a)  —  R(f,P)\  <  e 


for  all  (P,  {cfc})  that  are  5(c)-fine.  (Using  the  variable  c  in  the  gauge  function 
is  more  convenient  than  x  in  this  case.) 

Exercise  8.1.12.  For  each  c  E  [a,  6],  explain  why  there  exists  a  5(c)  >  0  (a 
5  >  0  depending  on  c)  such  that 


F(x)  -  F(c) 


x  —  c 


/(c) 


<  e 


for  all  0  < 


x  —  c 


<  5(c), 


This  5(c)  is  the  desired  gauge  on  [a,  b\.  Let  (P,  {c/c}^=1)  be  a  5(c) -fine  parti¬ 
tion  of  [a,  b\.  It  just  remains  to  show  that  equation  (2)  is  satisfied  for  this  tagged 
partition. 

Exercise  8.1.13.  (a)  For  a  particular  Ck  G  [xk-i,Xk\  of  P,  show  that 


and 


I F(xk)  ~  F(ck)  ~  f(ck)(xk  ~  ck) |  <  e(xk  -  ck) 


| F{ck)  -  F(xk- i)  -  /(cfc)(cfe  -  xk-i)\  <  e(cfc  -  xk-i) 


(b)  Now,  argue  that 

| F(xk)  -  F(xk- i)  -  /(cfe)( xk  -  xfc_i)|  <  e(xk  -  xk-i), 
and  use  this  fact  to  complete  the  proof  of  the  theorem. 

If  we  consider  the  function 


F(„\  =  /  x?>/ 2  sin(l/ x)  if  x  ^  0 

{  }  \  0  if  x  =  0 


□ 


then  it  is  not  too  difficult  to  show  that  P  is  differentiable  everywhere,  including 
x  =  0,  with 


(  (3/2)v/Tsin(l/x)  —  (l/yfx)  cos(l/x)  if  x  ^  0 

\  0  if  x  =  0. 
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What  is  notable  here  is  that  the  derivative  is  unbounded  near  the  origin.  The 
theory  of  the  ordinary  Riemann  integral  begins  with  the  assumption  that  we 
only  consider  bounded  functions  on  closed  intervals,  but  there  is  no  such  re¬ 
striction  for  the  generalized  Riemann  integral.  Theorem  8.1.9  proves  that  F' 
has  a  generalized  integral.  Now,  improper  Riemann  integrals  have  been  created 
to  extend  Riemann  integration  to  some  unbounded  functions,  but  it  is  another 
interesting  fact  about  the  generalized  Riemann  integral  that  any  function  hav¬ 
ing  an  improper  integral  must  already  be  integrable  in  the  sense  described  in 
Definition  8.1.6. 

As  a  parting  gesture,  let’s  show  how  Theorem  8.1.9  yields  a  short  verification 
of  the  substitution  technique  from  calculus. 

Theorem  8.1.10  (Change-of-variable  Formula).  Let  g  :  [a,  b]  — )►  R  be 

differentiable  at  each  point  of  [a,  b\,  and  assume  F  is  differentiable  on  the  set 
g([a,b}).  If  f(x)  =  F'  (x)  for  all  x  G  g([a,b\),  then 

fb  pg(b) 

/  (/ 0  g)  ■  9  =  /• 

Ja  1  9(cl) 


Proof.  The  hypothesis  of  the  theorem  guarantees  that  the  function  (F  o  g)(x) 
is  differentiable  for  all  x  G  [a,  b\. 

Exercise  8.1.14.  (a)  Why  are  we  sure  that  /  and  (F  o  g)'  have  generalized 

Riemann  integrals? 


(b)  Use  Theorem  8.1.9  to  finish  the  proof. 


□ 


The  impressive  properties  of  the  generalized  Riemann  integral  do  not  end 
here.  The  central  source  for  the  material  in  this  section  is  Robert  Bartle’s 
award  winning  article  “Return  to  the  Riemann  Integral,”  which  appeared  in  the 
American  Mathematical  Monthly ,  October,  1996.  The  article  goes  on  to  discuss 
convergence  theorems  for  this  new  integral  in  the  spirit  of  Theorem  7.4.4,  and 
outlines  the  argument  that  the  collection  of  integrable  functions  is  strictly  larger 
when  the  Lebesgue  integral  is  replaced  by  the  generalized  Riemann  integral.  In 
light  of  this,  the  author  boldly  declares  that  “the  time  has  come  to  discard  the 
Lebesgue  integral  as  the  primary  integral. ”  (Italics  in  the  original.) 

That  this  revolution  has  not  come  to  pass  may  simply  be  due  to  a  case  of 
overwhelming  inertia,  but  a  contributing  factor  is  very  likely  the  geometrically 
satisfying  intuition  of  Lebesgue’s  theory.  At  the  heart  of  Lebesgue’s  approach  to 
integration  is  the  desire  to  generalize  the  concepts  of  length  and  area.  Although 
one  can  certainly  use  a  properly  developed  integral  to  give  a  rigorous  definition 
for  the  length — or  measure — of  a  general  set,  there  is  a  compelling  argument 
that  this  puts  the  ideas  in  the  wrong  pedagogical  order.  Rather  than  using  a 
sophisticated  integral  to  generalize  a  primitive  notion  such  as  length,  Lebesgue 
found  an  effective  way  to  talk  about  the  length  of  a  very  wide  class  of  sets,  and 
used  that  to  build  his  definition  of  the  integral.  The  very  elegant  result  of  his 
endeavor  is  likely  to  be  the  industry  standard  for  a  long  time  to  come. 
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8.2  Metric  Spaces  and  the  Baire  Category 
Theorem 

A  natural  question  to  ask  is  whether  the  theorems  we  have  proved  about  se¬ 
quences,  series,  and  functions  in  R  have  analogues  in  the  plane  R2  or  in  even 
higher  dimensions.  Looking  back  over  the  proofs,  one  crucial  observation  is 
that  most  of  the  arguments  depend  on  just  a  few  basic  properties  of  the  abso¬ 
lute  value  function.  Interpreting  the  statement  u\x  —  y\”  to  mean  the  “distance 
from  x  to  y  in  R,”  our  aim  is  to  experiment  with  other  ways  of  measuring  dis¬ 
tance  on  other  sets  such  as  R2  and  C[ 0, 1],  the  space  of  continuous  functions  on 
[0,1]. 

Definition  8.2.1.  Given  a  set  A,  a  function  LXxl^Risa  metric  on  X 
if  for  all  x,  y  G  X: 

(i)  d(x,  y)  >  0  with  d(x,  y)  =  0  if  and  only  if  x  =  y, 

(ii)  d(x,y)  =  d(y,x),  and 

(iii)  for  all  z  G  X ,  d(x ,  y^j  ft  d(x ,  z)  -j-  d(z,  y^j. 

A  metric  space  is  a  set  X  together  with  a  metric  d. 

Property  (iii)  in  the  previous  definition  is  the  “triangle  inequality.”  The  next 
two  exercises  illustrate  the  point  that  the  same  set  X  can  be  home  to  several 
different  metrics.  When  referring  to  a  metric  space,  we  must  specify  the  set  and 
the  particular  distance  function  d. 

Exercise  8.2.1.  Decide  which  of  the  following  are  metrics  on  X  =  R2.  For 
each,  we  let  x  =  (aq,  oq)  and  y  =  (?/i,  2/2)  be  points  in  the  plane. 

(a)  d(x,y)  =  p Jx[  -i/i)2  +  (x2  -2/2)2- 

(b)  d(x,  y)  =  max{|a;i  —  j/i|,  |x2  —  2/2I}- 

(c)  d(x,y)  =  \xix2  +2/12/2  • 

The  metric  in  part  (a)  of  the  previous  exercise  is  the  familiar  Euclidean 
distance  between  two  points  in  the  plane.  This  is  often  referred  to  as  the  “usual” 
or  “standard”  metric  on  R2.  The  usual  metric  on  R  is  our  old  friend  d(x,  y)  = 

x  -  y . 

Exercise  8.2.2.  Let  C[0, 1]  be  the  collection  of  continuous  functions  on  the 
closed  interval  [0, 1].  Decide  which  of  the  following  are  metrics  on  C[0, 1]. 

(a)  d(f,g)  =  sup{|/(a;)  -  g{x)\  :  x  e  [0, 1]}. 

(b)  d(f,g)  =  |/(1)  —  <7(1)1- 

(c)  d(f,g)  =  /o  | f-g. 
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The  following  distance  function  is  called  the  discrete  metric  and  can  be 
defined  on  any  set  X.  For  any  x,y  G  X,  let 


p(x,y) 


1  if  x  ^  y 
0  if  x  —  y. 


Exercise  8.2.3.  Verify  that  the  discrete  metric  is  actually  a  metric. 


Basic  Definitions 

Definition  8.2.2.  Let  (X,  d)  be  a  metric  space.  A  sequence  (xn)  C  X  converges 
to  an  element  x  G  X  if  for  all  e  >  0  there  exists  an  N  G  N  such  that  d(xn,x)  <  e 
whenever  n  >  N. 

Definition  8.2.3.  A  sequence  (xn)  in  a  metric  space  ( X ,  d)  is  a  Cauchy  sequence 
if  for  all  e  >  0  there  exists  an  N  G  N  such  that  d(xm,xn)  <  e  whenever 
m,  n  >  N. 

Exercise  8.2.4.  Show  that  a  convergent  sequence  is  Cauchy. 

The  Cauchy  Criterion,  as  it  is  called  in  R,  was  an  “if  and  only  if”  statement. 
In  the  general  metric  space  setting,  however,  the  converse  statement  does  not 
always  hold.  Recall  that,  in  R,  the  assertion  that  “Cauchy  sequences  converge” 
was  shown  to  be  equivalent  to  the  Axiom  of  Completeness.  In  order  to  transport 
the  Axiom  of  Completeness  into  a  metric  space,  we  would  need  to  have  an 
ordering  on  our  space  so  that  we  could  discuss  such  things  as  upper  bounds.  It 
is  an  interesting  observation  that  not  every  set  can  be  ordered  in  a  satisfying 
way  (the  points  in  R2  for  example).  Even  without  an  ordering,  we  are  still  going 
to  want  completeness.  For  metric  spaces,  the  convergence  of  Cauchy  sequences 
is  taken  to  be  the  definition  of  completeness. 

Definition  8.2.4.  A  metric  space  (X,  d)  is  complete  if  every  Cauchy  sequence 
in  X  converges  to  an  element  of  X. 

Exercise  8.2.5.  (a)  Consider  R2  with  the  discrete  metric  p(x,y)  examined 

in  Exercise  8.2.3.  What  do  Cauchy  sequences  look  like  in  this  space?  Is 
R2  complete  with  respect  to  this  metric? 

(b)  Show  that  C[ 0, 1]  is  complete  with  respect  to  the  metric  in  Exercise 
8.2.2  (a). 

(c)  Define  C1  [0, 1]  to  be  the  collection  of  differentiable  functions  on  [0,1]  whose 
derivatives  are  also  continuous.  Is  C^O,  1]  complete  with  respect  to  the 
metric  defined  in  Exercise  8.2.2  (a)? 

Because  completeness  is  a  prerequisite  for  doing  anything  significant  in  the 
way  of  analysis,  the  metric  in  Exercise  8.2.2  (a)  is  the  most  natural  metric  to 
consider  when  working  with  C[ 0, 1].  The  notation 


1/  -  g\\oo  =  d(f , g)  =  sup{|/(x)  -  g(x)\  :  x  €  [0, 1]} 
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is  standard,  and  setting  g  —  0  gives  the  so-called  “sup  norm” 


l/lloo  =  d(/,0) 


sup{|/(x) 


:  r  G  [0, 1]}. 


In  all  upcoming  discussions,  it  is  assumed  that  the  space  C[ 0, 1]  is  endowed  with 
this  metric  unless  otherwise  specified. 

Definition  8.2.5.  Let  (X,  d\)  and  (Y,  cfe)  be  metric  spaces.  A  function  /  : 
X  — )►  Y  is  continuous  at  x  E  X  if  for  all  e  >  0  there  exists  a  S  >  0  such  that 
d2  (/(#), /(?/))  <  e  whenever  di(x,?/)  <  (5. 

Exercise  8.2.6.  Which  of  these  functions  from  C[0, 1]  to  R  (with  the  usual 
metric)  are  continuous? 

(a)  g(f )  =  Jq  /fc,  where  fc  is  some  fixed  function  in  C[ 0, 1]. 

(b)  3(/)  =  /(1/2). 

(c)  #(/)  =  /(1/2),  but  this  time  with  respect  to  the  metric  on  C[0, 1]  from 
Exercise  8.2.2  (c). 


Topology  on  Metric  Spaces 

Definition  8.2.6.  Given  e  0  and  an  element  x  nr  the  metric  space  (X”,  d), 
the  e-neighborhood  of  x  is  the  set  Ve(x)  =  {y  e  X  :  d(x,y)  <  e}. 

Exercise  8.2.7.  Describe  the  e-neighborhoods  in  R2  for  each  of  the  different 
metrics  described  in  Exercise  8.2.1.  How  about  for  the  discrete  metric? 

With  the  definition  of  an  e-neighborhood,  we  can  now  define  open  sets ,  limit 
points ,  and  closed  sets  exactly  as  we  did  before.  A  set  O  C  X  is  open  if  for 
every  x  E  O  we  can  find  a  neighborhood  Ve(x)  C  O.  A  point  x  is  a  limit  point 
of  a  set  A  if  every  Ve(x)  intersects  A  in  some  point  other  than  x.  A  set  C  is 
closed  if  it  contains  its  limit  points. 

Exercise  8.2.8.  Let  (X,  d)  be  a  metric  space. 

(a)  Verify  that  a  typical  e- neighborhood  Ve(x)  is  an  open  set.  Is  the  set 

Ce(x)  =  {y  e  X  :  d(x,y)  <  e} 


a  closed  set? 

(b)  Show  that  a  set  E  C  X  is  open  if  and  only  if  its  complement  is  closed. 

Exercise  8.2.9.  (a)  Show  that  the  set  Y  =  {/  E  C[0, 1]  :  \\f\\oo  A  1}  is 

closed  in  C[0, 1]. 

(b)  Is  the  set  T  =  {/  £  C[0, 1]  :  /( 0)  =  0}  open,  closed,  or  neither  in  C[0, 1]? 
We  define  compactness  in  metric  spaces  just  as  we  did  for  R. 
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Definition  8.2.7.  A  subset  K  of  a  metric  space  (X,  d)  is  compact  if  every 
sequence  in  K  has  a  convergent  subsequence  that  converges  to  a  limit  in  K. 

An  extremely  useful  characterization  of  compactness  in  R  is  the  proposition 
that  a  set  is  compact  if  and  only  if  it  is  closed  and  bounded.  For  abstract  metric 
spaces,  this  proposition  only  holds  in  the  forward  direction. 

Exercise  8.2.10.  (a)  Supply  a  definition  for  bounded  subsets  of  a  metric 

space  (X,  d). 

(b)  Show  that  if  K  is  a  compact  subset  of  the  metric  space  (X,  d),  then  K  is 
closed  and  bounded. 

(c)  Show  that  Y  C  C[ 0, 1]  from  Exercise  8.2.9  (a)  is  closed  and  bounded  but 
not  compact. 

A  good  hint  for  part  (c)  of  the  previous  exercise  can  be  found  in  Exer¬ 
cise  6.2.14  from  Chapter  6.  This  exercise  defines  the  concept  of  an  equicontin- 
uous  family  of  functions,  which  is  a  key  ingredient  in  the  Arzela-Ascoli  The¬ 
orem  (Exercise  6.2.15).  The  Arzela-Ascoli  Theorem  states  that  any  bounded, 
equicontinuous  collection  of  functions  in  C[ 0, 1]  must  have  a  uniformly  conver¬ 
gent  subsequence.  One  way  to  summarize  this  famous  result — which  we  did  not 
have  the  language  for  in  Chapter  6 — is  as  a  statement  describing  a  particular 
class  of  compact  subsets  in  C[ 0, 1].  Looking  at  the  definition  of  compactness, 
and  remembering  that  the  uniform  limit  of  continuous  functions  is  continuous, 
the  Arzela-Ascoli  Theorem  states  that  any  closed,  bounded,  equicontinuous 
collection  of  functions  is  a  compact  subset  of  C[ 0, 1]. 

Definition  8.2.8.  Given  a  subset  E  of  a  metric  space  (X,  d),  the  closure  E  is 
the  union  of  E  together  with  its  limit  points.  The  interior  of  E  is  denoted  by 
E°  and  is  defined  as 

E°  =  {x  G  E  :  there  exists  Ve(x)  C  E}. 

Closure  and  interior  are  dual  concepts.  Results  about  these  concepts  come 
in  pairs  and  exhibit  an  elegant  and  useful  symmetry. 

Exercise  8.2.11.  (a)  Show  that  E  is  closed  if  and  only  if  E  —  E.  Show  that 

E  is  open  if  and  only  if  E°  =  E. 

(b)  Show  that  E°  =  (Ec)°,  and  similarly  that  ( E°)c  =  Ec. 

A  good  hint  for  this  exercise  is  to  review  the  proofs  from  Chapter  3,  where 
closure  at  least  is  discussed.  Thinking  of  all  of  these  concepts  as  they  relate 
to  R  or  R2  with  the  usual  metric  is  not  a  bad  idea.  However,  it  is  important 
to  remember  also  that  rigorous  proofs  must  be  constructed  purely  from  the 
relevant  definitions. 

Exercise  8.2.12.  (a)  Show 

Ve(x)  C  {y  G  X  :  d(x,y)  <  e}, 
in  an  arbitrary  metric  space  (X,  d). 
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(b)  To  keep  things  from  sounding  too  familiar,  find  an  example  of  a  specific 
metric  space  where 


Vf_{x)  ^{y  e  X  :  d(x,  y)  <  e}. 

We  are  on  our  way  toward  the  Baire  Category  Theorem.  The  next  definitions 
provide  the  final  bit  of  vocabulary  needed  to  state  the  result. 

Definition  8.2.9.  A  set  A  C  X  is  dense  in  the  metric  space  (X,  d)  if  A  =  X. 
A  subset  E  of  a  metric  space  (X,  d)  is  nowhere- dense  in  X  if  E  is  empty. 

Exercise  8.2.13.  If  E  is  a  subset  of  a  metric  space  (X,  d),  show  that  E  is 
nowhere-dense  in  X  if  and  only  if  E  is  dense  in  X. 

The  Baire  Category  Theorem 

In  Section  3.5,  we  proved  Baire’s  Theorem,  which  states  that  it  is  impossible  to 
write  the  real  numbers  R  as  the  countable  union  of  nowhere-dense  sets.  Previous 
to  this,  we  knew  that  R  was  too  big  to  be  written  as  the  countable  union  of  single 
points  (R  is  uncountable),  but  Baire’s  Theorem  improves  on  this  by  asserting 
that  the  only  way  to  make  R  from  a  countable  union  of  arbitrary  sets  is  for 
the  closure  of  at  least  one  of  these  sets  to  contain  an  interval.  The  keystone 
to  the  proof  of  Baire’s  Theorem  is  the  completeness  of  R.  The  idea  now  is  to 
replace  R  with  an  arbitrary  complete  metric  space  and  prove  the  theorem  in 
this  more  general  setting.  This  leads  to  a  statement  that  can  be  used  to  discuss 
the  size  and  structure  of  other  spaces  such  as  R2  and  C[ 0, 1].  At  the  end  of 
Chapter  3,  we  mentioned  one  particularly  fascinating  implication  of  this  result 
for  C[ 0, 1],  which  is  that — despite  the  substantial  difficulty  required  to  produce 
an  example  of  one — most  continuous  functions  are  nowhere-differentiable.  It 
would  be  a  good  idea  at  this  point  to  reread  Sections  3.6  and  5.5.  We  are  now 
equipped  to  carry  out  the  details  promised  in  these  discussions. 

Theorem  8.2.10.  Let  (X,  d)  be  a  complete  metric  space,  and  let  {On}  be  a 
countable  collection  of  dense,  open  subsets  of  X .  Then,  rr=i  is  not  empty. 

Proof.  When  we  proved  this  theorem  on  R,  completeness  manifested  itself  in 
the  form  of  the  Nested  Interval  Property.  We  could  derive  something  akin 
to  NIP  in  the  metric  space  setting,  but  instead  let’s  take  an  approach  that 
uses  the  convergence  of  Cauchy  sequences  (because  this  is  how  we  have  defined 
completeness). 

Pick  x\  G  0\.  Because  0\  is  open,  there  exists  an  e\  >  0  such  that 
k€l(ri)COi. 

Exercise  8.2.14.  (a)  Give  the  details  for  why  we  know  there  exists  a  point 

x2  £  Vei(x1)n02  and  an  e^  >  0  satisfying  e 2  <  ei/2  with  Ve2(x2)  contained 
in  02  and 


V£2(x2)  C  Vei  (xi). 
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(b)  Proceed  along  this  line  and  use  the  completeness  of  (X,  d)  to  produce  a 
single  point  x  E  On  for  every  n  E  N.  □ 

Theorem  8.2.11  (Baire  Category  Theorem).  A  complete  metric  space  is 
not  the  union  of  a  countable  collection  of  nowhere-dense  sets. 

Exercise  8.2.15.  Complete  the  proof  of  the  theorem. 

This  result  is  called  the  Baire  Category  Theorem  because  it  creates  two 
categories  of  size  for  subsets  in  a  metric  space.  A  set  of  “first  category”  is  one 
that  can  be  written  as  a  countable  union  of  nowhere-dense  sets.  These  are  the 
small,  intuitively  thin  subsets  of  a  metric  space.  We  now  see  that  if  our  metric 
space  is  complete,  then  it  is  necessarily  of  “second  category,”  meaning  it  cannot 
be  written  as  a  countable  union  of  nowhere-dense  sets.  Given  a  subset  A  of  a 
complete  metric  space  X,  showing  that  A  is  of  first  category  is  a  mathematically 
precise  way  of  demonstrating  that  A  constitutes  a  very  minor  portion  of  the  set 
X.  The  term  “meager”  is  often  used  to  mean  a  set  of  first  category. 

With  the  stage  set,  we  now  outline  the  argument  that  continuous  functions 
that  are  differentiable  at  even  one  point  of  [0,1]  form  a  meager  subset  of  the 
metric  space  C[ 0, 1]. 

Theorem  8.2.12.  The  set 

D  =  {feC[0,l]:f'(x)  exists  for  some  x  E  [0, 1]} 
is  a  set  of  first  category  in  C[ 0, 1]. 

Proof.  For  each  pair  of  natural  numbers  m,  n,  define 

Am,n  =  j/  €  C[0, 1]  :  there  e 

f(x)  ~  f{t) 

x  —  t 

This  definition  takes  some  time  to  digest.  Think  of  1/m  as  defining  a  5- 
neighborhood  around  the  point  x,  and  view  n  as  an  upper  bound  on  the  mag¬ 
nitude  of  the  slopes  of  lines  through  the  two  points  (x,  f(x))  and  (t,  f(t)).  The 
set  Ara^n  contains  any  function  in  C[0, 1]  for  which  it  is  possible  to  find  at 
least  one  point  x  where  the  slopes  through  (x,  f(x))  and  points  on  the  function 
nearby — within  1/m  to  be  precise — are  bounded  by  n. 

Exercise  8.2.16.  Show  that  if  /  E  C/0, 1]  is  differentiable  at  a  point  x  E  [0,1], 
then  /  E  Am)n  for  some  pair  m,  n  E  N. 

The  collection  of  subsets  {Am?n  :  m,n  E  N}  is  countable,  and  we  have  just 
seen  that  the  union  of  these  sets  contains  our  set  D.  Because  it  is  not  difficult 
to  see  that  a  subset  of  a  set  of  first  category  is  first  category,  the  final  hurdle  in 
the  argument  is  to  prove  that  each  Am^n  is  nowhere-dense  in  C[ 0, 1]. 


dsts  x  E  [0, 1]  where 


1  1 

<  n  whenever  0  <  x  —  t  <  — 

m 
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Fix  m  and  n.  The  first  order  of  business  is  to  prove  that  Am,n  is  a  closed 
set.  To  this  end,  let  (/&)  be  a  sequence  in  Am jU  and  assume  fk  f  hr  C[ 0, 1]. 
We  need  to  show  /  E  Am,n- 

Because  fk  G  Am,m  then  for  each  k  G  N  there  exists  a  point  Xk  G  [0, 1] 
where 

-  fk  ( t ) 

Xk  -t 


<  n  for  all  0  <  Xk  —  t  <  1/m. 


Exercise  8.2.17.  (a)  The  sequence  (xk)  does  not  necessarily  converge,  but 

explain  why  there  exists  a  subsequence  (xkt)  that  is  convergent.  Let  x  = 


(b)  Prove  that  fkl{xkl )  ->  f(x). 


(c)  Now  finish  the  proof  that  Am  „  is  closed. 


Because  Am,n  is  closed,  A 


m,n 


=  A 


m,n  • 


In  order  to  prove  that  Am,n  is 


nowhere-dense,  we  just  have  to  show  that  it  contains  no  e-neighborhoods,  so 
pick  an  arbitrary  /  G  Am,m  let  e  >  0,  and  consider  the  e- neighborhood  V€(f) 
in  C[ 0, 1].  To  show  that  this  set  is  not  contained  in  Am,n,  we  must  produce  a 
function  g  G  C[0, 1]  that  satisfies  \\f  —  g\\oo  <  e  and  has  the  property  that  there 
is  no  point  x  G  [0, 1]  where 


g(x)  -  g{t) 

<  n  for  all  0  < 

x  —  t 

x  —  t 

1 1  <  1/m. 


Exercise  8.2.18.  A  continuous  function  is  called  polygonal  if  its  graph  consists 
of  a  finite  number  of  line  segments. 


Show  that  there  exists  a  polygonal  function  p  G  C[0, 1]  satisfying 
11/  -Plloo  <  e/2. 


Show  that  if  h  is  any  function  in  C[0, 1]  that  is  bounded  by  1,  then  the 
function 


g(x)  =  p(x)  +  —h(x) 


satisfies  g  G  Ve(f). 


(c)  Construct  a  polygonal  function  h(x)  in  C[0, 1]  that  is  bounded  by  1  and 
leads  to  the  conclusion  g  Am>n,  where  g  is  defined  as  in  (b).  Explain 
how  this  completes  the  argument  for  Theorem  8.2.12.  □ 


8.3  Euler’s  Sum 

In  Section  6.1  we  saw  Euler’s  first  and  most  famous  derivation  of  the  formula 


1111 

1  +  4  +  9  +  16  +  25  + 


7 T 
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At  the  crux  of  this  argument  are  two  representations  for  the  function  sin(x) 
The  first  is  the  standard  Taylor  series  representation 


(i) 


or 


sin(x)  =  x  —  —  + 


or 


x 


7 


+ 


3!  5!  7! 

and  the  second  is  an  infinite  product  representation 


Although  we  have  since  made  rigorous  sense  of  the  first  equation  (Example  6.6.4), 
proving  the  validity  of  equation  (2)  is  still  beyond  our  means. 

The  news  is  not  all  bad,  however.  In  the  time  since  Euler  first  made  this 
discovery,  dozens  of  different  proofs  for  this  result  have  been  published,  start¬ 
ing  with  several  by  Euler  himself  and  continuing  right  up  to  the  present.  The 
machinery  required  in  these  arguments  runs  the  gamut  from  multi- variable  cal¬ 
culus  to  Fourier  series  to  complex  integration,  but  one  in  particular  due  to  Boo 
Rim  Choe  relies  mainly  on  Taylor  series  expansions  and  properties  of  uniformly 
convergent  series.  Choe’s  argument  was  published  in  1987  but  actually  has  much 
in  common  with  one  of  Euler’s  original  attempts.  The  proof  outlined  in  this 
section  follows  Choe’s  argument  with  some  simplifications  due  to  Peter  Duren. 


Wallis’s  Product 


Even  though  we  don’t  currently  have  the  tools  to  prove  the  infinite  product 
formula  for  sin(x)  in  equation  (2),  we  can  prove  a  special  case. 

Exercise  8.3.1.  Supply  the  details  to  show  that  when  x  =  tt/2  the  product 
formula  in  (2)  is  equivalent  to 


*  =  lim  (EE  (E)  (E) 

2  n^oo  Vl-37  V3'5/  V5'7/ 


2  n  •  2  n 


(2n  —  l)(2n  +  1) 


•> 


where  the  infinite  product  in  (2)  is  interpreted  to  be  a  limit  of  partial  products. 
(Although  it  is  not  necessary  for  what  follows,  it  might  be  useful  to  review  the 
treatment  of  infinite  products  in  Exercises  2.4.10  and  2.7.10.) 

The  goal  of  the  next  few  exercises  is  to  supply  a  proper  proof  for  equation  (3). 
This  curious  formula  involving  i r  was  first  discovered  by  John  Wallis  (1616-1703) 
and  will  provide  some  key  ingredients  for  our  proof  of  Euler’s  sum.  It  resurfaces 
again  in  Section  8.4  where  the  factorial  function  is  defined. 

Set 

7T 

bn=  sin n(x)dx,  for  n  =  0, 1,  2, . . . . 

Jo 

The  first  few  terms  are  easy  enough  to  calculate;  in  particular, 


and  b\ 


sin  (x)dx  =  1. 


1  [13],  p.  92-95 
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Exercise  8.3.2.  Assume  h(pc)  and  k(x)  have  continuous  derivatives  on  [a,  b] 
and  derive  the  integration-by-parts  formula 

•  6  pb 

h(t)k'(t)dt  =  h(b)k(b)  —  h(a)k(a )  —  /  h'(t)k(t)dt  . 


a 


a 


Exercise  8.3.3.  (a)  Using  the  simple  identity  sinn(x)  =  sinn  1(x)  sin(x)  and 

the  previous  exercise,  derive  the  recurrence  relation 

n  —  1 

bn  =  - bn- 2  for  all  n  >  2. 

n 


(b)  Use  this  relation  to  generate  the  first  three  even  terms  and  the  first  three 
odd  terms  of  the  sequence  (bn). 

(c)  Write  a  general  expression  for  b^n  and  &2n+i- 

Because  0  <  sinn+1(x)  <  sinn(x)  on  [0,  tt/2] ,  it  follows  that  bn+ 1  <  bn  and 
(bn)  is  decreasing.  It  turns  out  that  (bn)  0  but  that  isn’t  the  limit  we  are 
interested  in  at  the  moment. 


Exercise  8.3.4.  Show 


lim 

n— oo 


b2n 

^2n+l 


and  use  this  fact  to  finish  the  proof  of  Wallis’s  product  formula  in  (3). 


There  are  some  standard  techniques  for  working  with  the  notation  of  equa¬ 
tion  (3).  For  instance, 


2  •  4  •  6  •  •  •  (2n)  =  2nn! 


and 


1  •  3  •  5  •  •  •  (2n  +  1) 


(2n  +  1)! 

2  •  4  •  6  •  •  •  (2 n) 


(2n  +  1)! 

2  nn\ 


Exercise  8.3.5. 

mula: 


Derive  the  following  alternative  form  of  Wallis’s  product  for- 


Taylor  Series 

The  next  step  in  the  argument  is  to  generate  the  Taylor  series  for  arcsin(x).  This 
is  not  really  possible  to  do  directly  from  Taylor’s  formula  for  the  coefficients, 
but  keeping  in  mind  that 


(arcsin(x))/ 


1 


we  can  get  where  we  want  to  go  by  first  finding  the  expansion  for  1  / y/l  —  x. 
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Exercise  8.3.6.  Show  that  1/y/l  —  x  has  Taylor  expansion  cn#n>  where 

Co  =  1  and 

(2 n)\  _  1  •  3  •  5  •  •  •  (2n  —  1) 

C”  ““  2 2"(n!)2  ~  2  •  4  •  6  •  •  •  2n 

for  n  >  1. 


The  coefficients  cn  should  look  familiar  from  our  work  on  Wallis’s  product. 
Exercise  8.3.5  can  be  rephrased  as 

y/n  =  lim  - — . 

n— ^ oo  Cny/n 

Exercise  8.3.7.  Show  that  limcn  =  0  but  Cn  diverges. 

The  divergence  of  Cn  makes  sense  when  we  consider  the  Taylor  series 

for  1/vT  —  x.  We  want  to  determine  the  values  of  x  for  which 


=  n  c«a;" 

n=0 


and  x  =  1  is  not  in  the  domain  of  the  left  side.  We  do  aim  to  prove  (4)  for  all 
x  G  (—1, 1)  but  the  usual  word  of  warning  is  in  order.  Having  computed  the 
coefficients  cn,  it  is  not  enough  to  simply  argue  that  the  series  on  the  right  side 


converges  when  x 
the  error  function 


<  1.  To  properly  establish  (4)  we  are  going  to  show  that 


En(x) 


1 


\/l  —  x 


N 

E 


Cnx 


n 


n- 


:0 


tends  to  zero  as  N  oo.  Back  in  Section  6.6,  the  primary  tool  we  used  for  this 
task  was  Lagrange’s  Remainder  Theorem  (Theorem  6.6.3),  but  it  is  not  up  to 
this  particular  challenge 


Exercise  8.3.8.  Using  the  expression  for  En(x)  from  Lagrange’s  Remainder 
Theorem,  show  that  equation  (4)  is  valid  for  all  \x\  <  1/2.  What  goes  wrong 
when  we  try  to  use  this  method  to  prove  (4)  for  x  G  (1/2, 1)? 


The  Integral  Form  of  the  Remainder 

The  moral  of  the  previous  exercise  is  that  we  need  a  different  method  for  es¬ 
timating  En(x).  The  Lagrange  form  of  the  remainder  grows  out  of  the  Mean 
Value  Theorems  and  yields  a  formula  for  the  error  function  in  terms  of  the 
derivative  j(iV+1).  Now  that  we  are  in  possession  of  a  proper  definition  of  the 
integral,  we  can  derive  another  useful  formula  for  En(x). 

Theorem  8.3.1  (Integral  Remainder  Theorem).  Let  f  be  differentiable 
N  T  1  times  on  (— R,  R )  and  assume  /(Ar+1)  is  continuous.  Define  an  = 
jO)(0)/n!  for  n  =  0, 1, . . . ,  N,  and  let 

Sn(x)  =  ao  +  a\x  +  a2X2  +  •  •  •  +  ajyxN . 
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For  all  x  E  (—R,R),  the  error  function  En(x)  =  f{x)  —  Sn{x)  satisfies 

eN(X) = t  r  ~  tfdt . 

Proof.  The  case  x  =  0  is  easy  to  check,  so  let’s  take  x  ^  0  in  (—R,R)  and  keep 
in  mind  that  x  is  a  fixed  constant  in  what  follows.  To  avoid  a  few  technical 
distractions,  let’s  just  consider  the  case  x  >  0. 

Exercise  8.3.9.  (a)  Show 


f(x)  =  /( 0)  +  [  f'(t)dt  . 

Jo 


(b)  Now  use  a  previous  result  from  this  section  to  show 


f(x)  =  /(o)  +  f(0)x  +  (  f"{t){x 

Jo 


(c)  Continue  in  this  fashion  to  complete  the  proof  of  the  theorem.  □ 

To  gain  a  better  understanding  of  this  formulation  for  En(x)  and  simulta¬ 
neously  make  some  headway  on  our  exploration  of  equation  (4),  let’s  return  to 
the  special  case  f(pc)  =  l/\/l  —  x. 

Exercise  8.3.10.  (a)  Make  a  rough  sketch  of  1/Vl  —  x  and  S 2(2?)  over  the 

interval  (—1, 1),  and  compute  for  x  =  1/2,  3/4,  and  8/9. 

(b)  For  a  general  x  satisfying  —  1  <  x  <  1,  show 

.  .  15  fx  fx-t\2  1 

Eoix)  =  —  /  -  - - -TT777  at  . 

V  ;  16  J0  \l-tj  (1  -tfC 

(c)  Explain  why  the  inequality 

x  —  t 


is  valid,  and  use  this  to  find  an  overestimate  for  \E2(x)\  that  no  longer 
involves  an  integral.  Note  that  this  estimate  will  necessarily  depend  on  x. 
Confirm  that  things  are  going  well  by  checking  that  this  overestimate  is 
in  fact  larger  than  \E2(x)\  at  the  three  computed  values  from  part  (a). 

(d)  Finally,  show  En(x)  0  as  N  00  for  an  arbitrary  x  E  (—1,1). 
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Having  established  that  the  Taylor  series  in  (4)  does  indeed  converge  for 
ah  \x\  <  1,  it  is  now  clear  sailing  to  produce  a  Taylor  series  representation  for 
arcsin(x).  The  first  step  is  to  substitute  x 2  for  x  in  (4)  to  get 


1 


oo 

E 

n— 0 


Cnx 


2  n 


for  all 


x 


<  1 


The  next  step  is  to  take  the  term-by-term  anti- derivative  of  this  series.  Any 
time  we  start  manipulating  infinite  series  as  though  they  were  finite  in  nature 
we  need  to  pause  and  make  sure  we  are  on  solid  footing. 

Exercise  8.3.11.  Assuming  that  the  derivative  of  arcsin(x)  is  indeed  1  / \/T  —  x2, 
supply  the  justification  that  allows  us  to  conclude 


oo 


arcsin(x)  = 


c 


n 


n— 0 


2n  +  1 


x 


2n+1  for  all 


X 


<  1  . 


Exercise  8.3.12.  Our  work  thus  far  shows  that  the  Taylor  series  in  (5)  is  valid 
for  all  [  x\  <  1,  but  note  that  arcsin(x)  is  continuous  for  all  \x\  <  1.  Carefully 
explain  why  the  series  in  (5)  converges  uniformly  to  arcsin(x)  on  the  closed 
interval  [—1,1]. 


Summing  Yin. =i  Vn2 

Every  proof  of  Euler’s  sum  contains  a  moment  of  genuine  ingenuity  at  some 
point,  and  this  is  where  our  proof  takes  an  unanticipated  turn. 

Let’s  make  the  substitution  x  =  sin(0)  in  (5)  where  we  restrict  our  attention 
to  — 7r/2  <  6  <  7t/2.  The  result  is 


oo 


6  =  arcsin(sin(#))  = 


c 


sm  + 


n=0 


2n  +  1 


( 0 ) 


which  converges  uniformly  on  [ — tt/ 2,  tt/2] 

Exercise  8.3.13.  (a)  Show 


P7T /2  00 

/  9d9  =  V' 

1 0  E'o 


c 


n 


2n  +  1 


^2n+l  ? 


being  careful  to  justify  each  step  in  the  argument.  The  term  &2n+i  refers 
back  to  our  earlier  work  on  Wallis’s  product. 


(b)  Deduce 


7 T 


oo 


E 


i 


8  (2n  +  l)2  ’ 

n— 0  v  7 


and  use  this  to  finish  the  proof  that  7r2/6  =  Y™=i 
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The  Riemann-Zeta  Function 

Euler’s  determination  of  the  value  of  ^  1/n2  brought  him  international  recog¬ 
nition  and  represented  a  significant  milestone  in  what  would  be  a  lifelong  ex¬ 
ploration  of  series  of  the  form  ^  l/ns.  Euler’s  original  argument  for  summing 
J^l/n2  discussed  in  Section  6.1  involved  equating  the  coefficient  of  x 2  in  two 
different  series  expansions  for  sm(x)/x.  By  equating  the  coefficients  of  higher 
powers  of  x  he  was  also  able  to  sum  l/ns  for  s  =  4,  6,  8, 10  and  12.  (Try  it 
for  8  =  4.)  Eventually,  Euler  worked  out  a  general  formula  for  any  even  natural 
number,  and  in  the  process  he  shifted  his  focus  to  thinking  about  ^  l/ns  as  a 
function  of  the  variable  s.  The  iconic  notation 

OO 

c(«)  =  E  for  a11  s  >  r 

71  =  1 

and  the  name — the  Riemann-zeta  function — would  come  one  hundred  years 
later,  but  it  was  Euler  who  first  unearthed  many  deep  properties  of  this  func¬ 
tion.  Significant  among  these  is  a  connection  to  the  prime  numbers,  evident  in 
the  Eulerian  formula 


where  the  product  is  taken  over  all  the  primes.  The  mathematics  underlying 
the  Riemann-zeta  function  gets  complicated  very  quickly,  but  this  particular 
formula  is  actually  quite  accessible.  Notice  that  for  each  prime  p, 


1 


1  —  p~ 


=  1  + 


1 


+ 


1 


p°  p 


2s 


+ 


1 


P 


3s 


+ 


1 


V 


4s 


+ 


Multiplying  out  the  product  on  the  right  in  (6)  in  this  fashion  and  using  the 
fact  that  every  n  G  N  is  a  unique  product  of  primes  leads  naturally  to  the  given 
relationship. 

Euler  returned  to  study  ((s)  many  times  in  his  career,  in  part  it  seems  to 
tend  to  the  unfinished  business  of  evaluating  ^  1  /ns  for  the  odd  integers.  Amid 
his  many  successes,  this  was  a  challenge  that  eluded  Euler,  as  it  has  eluded  every 
mathematician  since. 


8.4  Inventing  the  Factorial  Function 

The  goal  of  this  section  is  to  produce  a  function  f(x),  defined  on  all  of  R, 
with  the  property  that  f(n)  =  n\  for  each  n  G  N.  With  no  other  restriction  on 
/,  this  is  as  easy  as  it  is  uninteresting — simply  define  /  piecewise  in  such  a  way 
that  it  passes  through  the  points  (1,1),  (2,2),  (3,  6),  (4,  24),  and  so  on.  Letting 

f(x)  = 


n\  if  n  <  x  <  n  +  1,  n  G  N 

1  if  x  <  1 
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does  the  trick. 

To  make  this  problem  meaningful  we  need  to  be  much  more  discriminating 
about  what  properties  we  require  /  to  have.  Should  /  be  continuous?  Differ¬ 
entiable?  Twice  differentiable?  We  shall  see  about  this.  This  problem  actually 
has  its  origins  in  a  series  of  1729  letters  between  Christian  Goldbach  (of  “Gold- 
bach’s  Conjecture”  fame,  although  that  is  a  different  story)  and  Leonard  Euler. 
The  term  “function”  in  Euler’s  day  implicitly  referred  to  a  mapping  defined  by 
an  analytic  expression  comprised  of  the  elementary  functions  and  operations 
of  calculus.  Logarithms,  exponentials,  polynomials,  and  power  series  were  ex¬ 
amples  of  18th  century  functions;  the  piecewise  concoction  proposed  above  was 
not. 

Thus,  a  better  statement  of  our  goal — although  still  a  little  imprecise — is  to 
find  a  function  defined  by  a  single,  organic  formula  which  extends  the  definition 
of  nl  in  a  meaningful  way  to  non-natural  numbers. 

Exercise  8.4.1.  For  n  E  N,  let 

n#  =  n  +  {n  —  1)  +  (n  —  2)  +  •  •  •  +  2  +  1. 

(a)  Without  looking  ahead,  decide  if  there  is  a  natural  way  to  define  0#.  How 
about  (—2)#?  Conjecture  a  reasonable  value  for 

(b)  Now  prove  n#  =  ^n(n  +  1)  for  all  n  E  N,  and  revisit  part  (a). 

The  formula  in  part  (b)  of  the  previous  exercise  not  only  simplifies  the  calcu¬ 
lation  of  n#  for  large  values  of  n,  but  also  yields  a  properly  defined  function  on 
R  when  the  discrete  variable  n  is  replaced  with  the  continuous  variable  x.  In¬ 
deed,  Euler  would  be  perfectly  comfortable  with  the  expression  x #  =  ^x(x  +  l). 

We  are  seeking  something  similar  for  nl.  What  is  the  right  definition  for  x\ 
when  x  E  R? 

The  Exponential  Function 

The  idea  of  extending  the  definition  of  a  function  defined  on  N  to  all  of  R  may 
at  first  sound  like  a  somewhat  whimsical  enterprise,  but  it  is  perfectly  analogous 
to  the  way  we  come  to  understand  a  function  like  2X .  Similar  to  nl,  2n  for  n  E  N 
is  unambiguous  and  meaningful  the  minute  we  understand  multiplication,  but 
something  like  2-7r  is  another  matter.  Because  it  is  instructive,  and  because 
we  are  going  to  presently  need  functions  of  the  form  tx,  let’s  take  a  moment  to 
define  exponential  functions  in  a  rigorous  way. 

Typically  the  way  a  function  like  2X  gets  defined  on  R  is  through  a  series  of 
domain  expansions.  Starting  with  2n,  we  first  expand  the  domain  to  Z  using 
reciprocals,  then  to  Q  using  roots,  and  finally  to  R  using  continuity.  Although 
we  could  follow  this  strategy,  we  are  going  to  take  a  different  approach  that  has 
the  advantage  of  yielding  the  important  properties  we  need  more  efficiently. 

Step  one  is  to  properly  define  the  natural  exponential  function  ex.  Back 
in  Chapter  6,  we  assumed  ex  was  already  defined  and  showed  how  it  could  be 
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represented  by  its  Taylor  series.  Here  we  flip  this  process  around.  The  problem 
on  the  table  is  to  rigorously  construct  a  proper  definition  for  ex,  and  the  theory 
of  power  series  gives  us  a  bedrock  foundation  on  which  to  build. 

Define 


oo 


(1) 


n  2  S 

/"v»  '  t'  /"v*  /-v» 

/  v  "A  T  eh  eh 

E(x)  =  zZ^7  =  1  +  a:  +  ~  +  ~  + 


n=0 


2!  3! 


Exercise  8.4.2.  Verify  that  the  series  converges  absolutely  for  all  xgR,  that 
E{x)  is  differentiable  on  R,  and  E'(x)  =  E(pc). 

Exercise  8.4.3.  (a)  Use  the  results  of  Exercise  2.8.7  and  the  binomial  for¬ 

mula  to  show  that  E{pc  +  y)  =  E(x)E(y)  for  all  x,  y  G  R. 

(b)  Show  that  E( 0)  =  1,  E(—x)  =  1  /E(x),  and  E(x)  >  0  for  all  xgR. 

The  takeaway  here  is  that  the  power  series  E(pc)  satisfies  all  the  properties 
we  associate  with  the  exponential  function,  and  we  can  therefore  give  ourselves 
permission  to  go  back  to  the  more  familiar  notation  ex  in  place  of  E(x).  What 
happens  if  we  have  a  momentary  relapse  and  interpret  ex  as  the  real  number 
e  ~  2.71828  . . .  raised  to  the  power  x  rather  than  E(x)l  Not  to  worry — the  two 
interpretations  coincide,  whenever  the  former  is  defined  in  the  usual  way. 

Exercise  8.4.4.  Define  e  =  E(  1).  Show  E(n)  =  en  and  E(m/n )  =  ( yfe)771  for 
all  m,  n  G  Z. 


One  final  property  of  ex  we  need  is  its  behavior  as  x  — >■  Too. 


Definition  8.4.1.  Given  /  :  [a,  oo]  — >■  R,  we  say  that  lima,^00/(x)  =  L  if, 
for  all  e  >  0,  there  exists  M  >  a  such  that  whenever  x  >  M  it  follows  that 
I  fix)  -  L\  <e. 

Exercise  8.4.5.  Show  lim^^oo  xne~x  =  0  for  all  n  =  0, 1,  2, ... . 

To  get  started  notice  that  when  x  >  0,  all  the  terms  in  (1)  are  positive. 


Other  Bases 

Having  set  ex  on  solid  mathematical  footing,  we  can  now  do  the  same  for  tx 
where  t  >  0.  This  requires  use  of  the  natural  logarithm. 

Exercise  8.4.6.  (a)  Explain  why  we  know  ex  has  an  inverse  function — let’s 

call  it  log  x — defined  on  the  strictly  positive  real  numbers  and  satisfying 

(i)  log(ey)  =  y  for  all  y  e  R  and 

(ii)  elogx  =  x ,  for  all  x  >  0. 

(b)  Prove  (log  a;)'  =  1/x.  (See  Exercise  5.2.12.) 

(c)  Fix  y  0  and  differentiate  log (xy^J  with  respect  to  x.  Conclude  that 

log (xy)  =  log  x  +  log  y  for  all  x,y  >  0. 
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(d)  For  t  >  0  and  n  E  N,  tn  has  the  usual  interpretation  as  t  •  t  •  •  •  t  ( n  times). 
Show  that 

(2)  tn  =  enlogt  for  all  n  E  N. 

Part  (d)  of  the  previous  exercise  is  the  pivotal  formula  because  the  expression 
on  the  right  of  the  equal  sign  is  meaningful  if  we  replace  n  with  x  E  R.  This 
is  our  cue  to  use  the  identity  in  (2)  as  a  template  for  the  definition  of  tx  on  all 
of  R. 

Definition  8.4.2.  Given  t  >  0,  define  the  exponential  function  tx  to  be 

tx  =  ex  log  1  for  all  x  E  R. 

Exercise  8.4.7.  (a)  Show  =  ( T/t)171  for  all  m,n  E  N. 

(b)  Show  log (tx)  =  xlogt,  for  all  t  >  0  and  x  E  R. 

(c)  Show  tx  is  differentiable  on  R  and  find  the  derivative. 

Finding  the  right  definition  for  xl  is  harder  than  defining  tx,  but  the  strategy 
is  essentially  the  same.  We  are  seeking  a  formula  of  the  form  nl  =  g(n)  where 
g  yields  a  meaningful  formula  when  n  is  replaced  by  x.  What  might  such  a 
function  g(x)  =  x\  look  like  when  graphed  over  R?  For  x  >  0  it  must  grow 
extremely  rapidly  to  keep  up  with  nl,  but  how  about  on  x  <  0?  Using  a 
functional  equation  for  x\  we  can  create  a  reasonable  artist’s  rendering  of  the 
function  we  are  looking  for. 


The  Functional  Equation 

A  defining  property  of  the  factorial  on  N  is  that  1!  =  1  and  nl  =  n(n  —  1)!  for  all 
n  >  2.  Thus  it  seems  reasonable  to  require  the  same  from  our  currently  mythic 
function  xl  defined  on  R.  Whatever  xl  means  it  should  satisfy 


xl  =  x(x  —  1)!  for  all  xgR. 

Setting  n  =  1  in  this  equation,  for  example,  yields  1  =  0!. 

Exercise  8.4.8.  Inspired  by  the  fact  that  0!  =  1  and  1!  =  1,  let  h(x)  satisfy 

(i)  h(x)  =  1  for  all  0  <  x  <  1,  and 

(ii)  h(x)  =  xh(x  —  1)  for  all  x  E  R. 

(a)  Find  a  formula  for  h(x)  on  [1,2],  [2,3],  and  [n,n  +  1]  for  arbitrary 
n  E  N. 

(b)  Now  do  the  same  for  [—1,0],  [—2,  —1],  and  [— n,  —  n  +  1]. 

(c)  Sketch  h  over  the  domain  [—4,4]. 
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Notice  that  h{pc)  satisfies  h(ri)  =  nl  and  it  is  at  least  continuous  for  x  >  0, 
but  its  piecewise  definition  and  its  many  non-differentiable  corners  disqualify  it 
from  being  our  sought  after  factorial  function.  One  legitimate  conclusion  that 
arises  out  of  this  exercise  is  that  xl,  when  we  find  it,  will  exhibit  the  same 
asymptotic  behavior  as  h  at  x  =  -1,-2,— 3,...,  and  thus  won’t  be  defined  on 
the  negative  integers. 


Improper  Riemann  Integrals 

For  reasons  that  will  become  clear,  we  need  to  make  rigorous  sense  of  an  ex¬ 
pression  like 

oo 

e~l  dt. 

Most  likely  familiar  from  calculus,  integrals  over  unbounded  regions  like  [0,  oo) 
are  called  improper  Riemann  integrals  and  are  defined  by  taking  the  limit  of 
“proper”  integrals. 


Definition  8.4.3.  Assume  /  is  defined  on  [a,  oo)  and  integrable  on  every  inter¬ 
val  of  the  form  [a,  b\.  Then  define  f  to  be 


lim 

6—^oo 


/ 


provided  the  limit  exists.  In  this  case  we  say  the  improper  integral  Ja°°  /  con¬ 
verges. 


Exercise  8.4.9.  (a)  Show  that  the  improper  integral  Ja°°  /  converges  if  and 

only  if,  for  all  e  >  0  there  exists  M  >  a  such  that  whenever  d  >  c  >  M  it 
follows  that 


<  e. 


(In  one  direction  it  will  be  useful  to  consider  the  sequence  an  =  J^+n  /.) 

(b)  Show  that  if  0  <  /  <  g  and  J^°  g  converges  than  f  converges. 

(c)  Part  (a)  is  a  Cauchy  criterion,  and  part  (b)  is  a  comparison  test.  State 
and  prove  an  absolute  convergence  test  for  improper  integrals. 


Exercise  8.4.10.  (a)  Use  the  properties  of  el  previously  discussed  to  show 


(b)  Show 


1 

a 


e  atdt , 


for  all  a  >  0. 
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Just  for  a  moment,  let’s  take  our  analysis  gloves  off  and  ask  what  we  think 
might  happen  if  we  differentiate  formula  (3)  with  respect  to  a. 

On  the  left-hand  side  we  certainly  get 

"ll'_  1 

a  a2 


On  the  right-hand  side  of  (3),  let’s  brazenly  crash  through  the  integral  sign  and 
take  the  derivative  of  the  integrand  e~at  with  respect  to  a  (thinking  of  t  as  a 
constant.)  The  result  is 


e 


—  at 


/ 


The  question,  then,  is  whether  this  is  a  valid  manipulation.  Is  it  true  that 


te~atdt  ? 


Well,  let’s  compute  the  integral  and  find  out. 

Exercise  8.4.11.  (a)  Evaluate  JQ6  te~atdt  using  the  integration- by-parts  for¬ 

mula  from  Exercise  7.5.6.  The  result  will  be  an  expression  in  a  and  b. 

(b)  Now  compute  J0°°  te~at dt  and  verify  equation  (4). 

Apparently,  our  bold  differentiation  of  equation  (3)  into  equation  (4)  worked 
out.  Now  it’s  time  to  put  our  analysis  gloves  back  on  and  see  why  this  is  so. 


Differentiating  Under  the  Integral 

Let  /(#,  t)  be  a  function  of  two  variables,  defined  for  all  a  <  x  <  b  and  c  <  t  <  d. 
The  domain  of  /  is  then  a  rectangle  D  in  R2. 

What  does  it  mean  to  say  /  is  continuous  at  a  point  (xo,to)  in  D?  Section  8.2 
on  metric  spaces  gives  a  more  thorough  explanation,  but  the  only  real  difference 
from  the  single  variable  setting  is  that  we  have  to  replace  our  sense  of  distance 
between  points  (#o,to)  and  (x,  t)  with  the  familiar  Euclidean  distance  formula 

|  (x,t)  -  (xq,  to)  ||  =  (x  —  X0)2  +  (t-  to)2. 


Definition  8.4.4.  A  function  f  :  D  R  is  continuous  at  (xo,to)  if  for  all 
e  >  0,  there  exists  S  >  0  such  that  whenever  ||(x,t)  —  (#o,to)||  <  ^  it  follows 
that 

I  f(x,t)  -  f(xQ,t0)\  <  e. 


Exercise  8.4.12.  Assume  the  function  f(x,t)  is  continuous  on  the  rectangle 
D  =  {(x,t)  :  a<x<b,c<t<d}.  Explain  why  the  function 


nd 

F(x)  =  /  f(x,t)dt 


is  properly  defined  for  all  x  G  [a,  b\. 
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It  should  not  be  too  surprising  that  Theorem  4.4.7  has  an  analogue  in  the 
R2  setting.  The  set  D  is  compact  in  R2,  and  a  continuous  function  on  D  is 
uniformly  continuous  in  the  sense  that  the  S  in  Definition  8.4.4  can  be  chosen 
independently  of  the  point  (xo,to). 

Theorem  8.4.5.  If  f(x,t)  is  continuous  on  D,  then  F(x)  =  rf(x,  t)dt  is 
uniformly  continuous  on  [a,  b\. 

Exercise  8.4.13.  Prove  Theorem  8.4.5. 


Taking  inspiration  from  equations  (3)  and  (4),  let’s  add  the  assumption  that 
for  each  fixed  value  of  t  in  [c,  d],  the  function  f(x,t)  is  a  differentiable  function 
of  00  ^  that  is, 


fx  (x,  t )  =  lim 

Z^-X 


f(z,t)  ~  f(x,t) 

z  —  X 


exists  for  all  (x,t)  G  D.  In  addition,  let’s  assume  that  the  derivative  function 
fx(x,t)  is  continuous. 


Theorem  8.4.6.  If  f{x,t)  and  fx(x,t)  are  continuous  on  D,  then  the  function 
F(x)  =  fd  f(x,t)dt  is  differentiable  and 


nd 

F'(x)=  /  fx(x, t)dt. 


Proof.  Fix  x  in  [a,  b]  and  let  e  >  0  be  arbitrary.  Our  task  is  to  find  a  6  >  0  such 
that 


F(z)  -  F (x) 
z  —  x 


>d 


fx(x,t)dt 


<  e 


whenever  0  < 


z  —  x 


< 


5 . 


Exercise  8.4.14.  Finish  the  proof  of  Theorem  8.4.6 


□ 


Improper  Integrals,  Revisited 

Theorem  8.4.6  is  a  formal  justification  for  differentiating  under  the  integral  sign, 
but  we  need  to  extend  this  result  to  the  case  where  the  integral  is  improper. 
Looking  back  one  more  time  to  our  motivating  example  in  equation  (3),  we  see 
that  what  we  have  is  a  function  f(x,t)  where  the  domain  of  the  variable  t  is  the 
unbounded  interval  c  <  t  <  oo. 

Let’s  fix  x  from  some  set  A  C  R.  For  such  an  x,  we  define 


(6) 


rOO  pd 

F(x)  =  /  f(x,t)dt=  lim  /  f(x,t)dt, 

Jc  d^°°  Jc 


provided  the  limit  exists. 


8.4.  Inventing  the  Factorial  Function 


277 


Notice  that  the  formula  in  (6)  is  a  pointwise  statement.  Given  an  x  <E  A  and 
e  >  0,  we  can  find  an  M  (perhaps  dependent  on  x)  where 


/d 

f(x,  t)dt 


<  e 


whenever  d  >  M .  As  we  have  seen  on  numerous  occasions,  the  elixir  required 
to  ensure  that  good  behavior  in  the  finite  setting  extends  to  the  infinite  setting 
is  uniformity. 

Definition  8.4.7.  Given  f(x,t)  defined  on  D  =  {(x,t)  :  x  E  A,  c  <  t},  assume 
F(x)  =  Jc°°  f(x,  t)dt  exists  for  all  x  E  A.  We  say  the  improper  integral  converges 
uniformly  to  F(x)  on  A  if  for  all  e  >  0,  there  exists  M  >  c  such  that 


F(x) 


>d 


f(x,  t)dt 


<  e 


for  all  d  >  M  and  all  x  E  A. 

Exercise  8.4.15.  (a)  Show  that  the  improper  integral  J0°°  e~xtdt  converges 

uniformly  to  1/r  on  the  set  [1/2,  oo). 


(b)  Is  the  convergence  uniform  on  (0,  oo)? 


Exercise  8.4.16.  Prove  the  following  analogue  of  the  Weierstrass  M-Test  for 
improper  integrals:  If  f(x,t)  satisfies  \f(x,t)\  A  g(t )  for  all  x  E  A  and  f^°  g(t)dt 
converges,  then  Ja°°  /(x,  t)dt  converges  uniformly  on  A. 

An  immediate  consequence  of  Definition  8.4.7  is  that  if  the  improper  integral 
converges  uniformly  then  the  sequence  of  functions  defined  by 


/(#,  t)dt 


converges  uniformly  to  F(x)  on  [a,  b\.  This  observation  gives  us  access  to  the 
host  of  useful  results  we  developed  in  Chapter  6. 


Theorem  8.4.8.  If  f(x,t)  is  continuous  on  D  =  {(x,t)  :  a  <  x  <  6,  c  <  t}, 
then 


POO 

F(x)  =  /  f(x,t)dt 


is  uniformly  continuous  on  [a,  b\,  provided  the  integral  converges  uniformly. 

Exercise  8.4.17.  Prove  Theorem  8.4.8. 


Theorem  8.4.9.  Assume  the  function  f(x,t)  is  continuous  on  D  —  {(x,t)  : 
a  <  x  <  6,  c  <  t]  and  F{x)  =  Jc°° /(x,  t)dt  exists  for  each  x  E  [a,  b\.  If  the 
derivative  function  fx(x,t)  exists  and  is  continuous ,  then 

POO 

(7)  F'(x)=  /  fx(x,t)dt, 

J  C 

provided  the  integral  in  (7)  converges  uniformly. 


Exercise  8.4.18.  Prove  Theorem  8.4.9. 
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The  Factorial  Function 


It’s  time  to  return  our  attention  to  equation  (3)  from  earlier  in  this  section: 


1 

a 


for  all  a  >  0. 


Exercise  8.4.19.  (a)  Although  we  verified  it  directly,  show  how  to  use  the 

theorems  in  this  section  to  give  a  second  justification  for  the  formula 


1 


•oo 


te  atdt ,  for  all  a  >  0. 


a * 


(b)  Now  derive  the  formula 


an+1 


tne~atdt, 


for  all  a  >  0. 


If  we  set  a  =  1  in  equation  (8)  we  get 

oo 

tne~tdt. 

The  appearance  of  n\  on  the  left  side  of  this  equation  is  an  exciting  development, 
especially  because  where  n  appears  on  the  right  it  can  be  meaningfully  replaced 
by  a  real  variable  x,  at  least  when  x  >  0.  This  is  the  equation  we  have  been 
looking  for! 


Definition  8.4.10.  For  x  >  0,  define  the  factorial  function 

oo 

txe~l  dt. 

Exercise  8.4.20.  (a)  Show  that  x\  is  an  infinitely  differentiable  function  on 

(0,  oo)  and  produce  a  formula  for  the  nth  derivative.  In  particular  show 
that  (x!)"  >  0. 

(b)  Use  the  integration- by-parts  formula  employed  earlier  to  show  that  x\ 
satisfies  the  functional  equation 


(x  +  1)!  =  (x  +  l)x! . 

The  previous  exercise  is  our  first  piece  of  evidence  that  we  have  found  the 
right  definition  for  x\.  There  is  more  to  come. 

A  consequence  of  (x!)/;  >  0  is  that  x\  is  a  convex  function.  In  calculus  this  is 
usually  referred  to  as  “concave  up”  and  means  that  the  line  segment  connecting 
two  points  on  the  graph  of  x\  always  sits  above  the  curve.  Said  another  way, 
there  are  no  inflection  points  in  x\  and  the  slope  of  the  curve  steadily  increases 
as  the  graph  passes  through  the  points  (n,  n!)  for  n  =  0, 1,  2, . . ..  We  did  not 


8.4.  Inventing  the  Factorial  Function 


279 


a  a'  b  b' 

Figure  8.1:  Increasing  chord  slopes  on  a  convex  function. 


mention  this  property  at  the  time,  but  reflecting  on  our  earlier  analogy  between 
2X  and  ad,  convexity  is  a  natural  condition  to  desire  in  our  factorial  function. 

In  fact,  not  only  is  ad  convex  but  log  (ad)  is  also  convex.  This  is  a  stronger 
statement.  (Consider,  for  instance,  the  graphs  of  x2  +  1  and  log(x2  +  1).)  The 
proof  is  a  little  technical  and  we  won’t  go  through  it,  but  the  fact  that  log  (ad) 
is  convex  on  x  >  0  is  quite  significant.  Here’s  why. 


Theorem  8.4.11  (Bohr— Mollerup  Theorem).  There  is  a  unique  positive 
function  f  defined  on  x  >  0  satisfying 

(i)  /( 0)  =  1 

(ii)  f(x  +  1)  =  (x  +  l)f(x),  and 

(iii)  log (f(x))  is  convex. 

Because  x\  satisfies  properties  (i),  (ii),  and  (iii),  it  follows  that  f(x)  =  ad. 


Proof.  We  need  one  more  geometrically  plausible  fact  about  convex  functions. 
If  [a,  b]  and  [a7,  b']  are  two  intervals  in  the  domain  of  a  convex  function  </>,  and 
a  <  a'  and  b  <  b\  then  the  slopes  of  the  chords  over  these  intervals  satisfy 


4>{b)  -  4>{a)  <  <j>(b')  -  4>(a!) 
b  —  a  ~  b'  —  a' 


(See  Figure  8.1). 

Because  /  satisfies  properties  (i)  and  (ii)  we  know  f(n)  =  n\  for  all  n  E  N. 
Now  fix  n  E  N  and  x  E  (0, 1]. 


Exercise  8.4.21.  (a)  Use  the  convexity  of  log (f(x))  and  the  three  intervals 

n  —  1,  n],  [n,  n  +  a?],  and  [n,  n  +  1]  to  show 


x  log(n)  <  log (f(n  +  x))  —  log(n!)  <  x  log(n  +  1). 


(b)  Show  log (f(n  +  x))  =  log (f(x))  +  log((x  +  l)(x  +  2)  •  •  •  {x  +  n)). 
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(c)  Now  establish  that 


/  77  x  f) !  \ 

0  <  log (/(*))  -  log  - 

\(x  +  l)(x  +  2)  •  •  •  (X  +  n) ) 


n 


(d)  Conclude  that 


lim 

n— >-  oo 


nxn\ 

(x  +  l)(x  +  2)  •  •  •  (pc  +  n)  ’ 


for  all  x  G  (0, 1]. 


(e)  Finally,  show  that  the  conclusion  in  (d)  holds  for  all  x  >  0. 

Because  we  have  arrived  at  an  explicit  formula  for  /(#),  the  function  f(pc) 
must  be  unique.  By  virtue  of  the  fact  that  x\  satisfies  conditions  (i),  (ii),  and  (iii) 
of  the  theorem,  we  can  conclude  that  x\  is  this  unique  function;  i.e. ,  f(x)  =  x\. 
Thus,  not  only  have  we  proved  the  theorem,  but  we  have  also  discovered  an  alter¬ 
nate  representation  for  the  factorial  function  called  the  Gauss  product  formula: 


•oo 


rp  | 
eXy  • 


I  — 


lX--zdt=  lim 


txe 


nxn\ 


n 


^oo  (x  +  l)(x  +  2)  •  •  •  (x  +  n)  ’ 


for  all  x  >  0.  □ 

What  happens  if  x  <  0?  The  integral  in  Definition  8.4.10  becomes  improper 
for  a  second  reason  when  x  <  0  because  tx  is  unbounded  and  undefined  at  t  =  0. 
If  —1  <  x  <  0,  it  is  not  hard  to  show  that  the  integral  still  converges.  On  the 
other  hand,  the  functional  equation  in  Exercise  8.4.20(b)  provides  a  natural  way 
to  extend  the  definition  of  x\  to  all  of  R.  Just  as  in  Exercise  8.4.8,  the  resulting 
function  is  never  zero,  alternating  between  positive  and  negative  components 
with  vertical  asymptotes  atx  =  — 1,— 2,— 3,.... 

The  Gamma  Function 

The  focus  of  our  discussion  has  been  on  the  ingredients  that  go  into  the  def¬ 
inition  of  x\ — improper  integrals,  proper  definitions  of  exponential  functions, 
differentiating  under  the  integral  sign — but  the  end  result  is  a  function  worthy 
of  its  own  separate  chapter.  Since  its  discovery  by  Euler,  the  factorial  function 
has  become  ubiquitous  in  numerous  branches  of  analysis. 

One  of  the  early  modifications  that  occurred  was  a  shift  in  the  domain  of 
x\  and  a  change  in  the  notation.  Adrien  Marie  Legendre  introduced  the  Greek 
letter  T  (gamma)  and  set 

/•OO 

T(x)  =  (x  —  1)!  =  /  tx~1e~tdt1 

Jo 

so  that  T(n  +  1)  =  n\  and  xT(x)  =  T(x-t-  1).  This  convention  eventually  became 
the  standard,  and  so  it  is  the  gamma  function  that  routinely  appears  in  formulas 
from  number  theory,  probability,  geometry,  and  beyond. 
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Philip  Davis’s  article  on  the  history  of  the  gamma  function  (see  [11])  is  an 
excellent  place  to  get  a  sense  of  the  important  role  the  gamma  function  has 
played  in  the  development  of  analysis.  Davis’s  essay  seems  to  be  at  least  part 
of  the  inspiration  for  a  wonderful  series  of  articles  by  David  Fowler  that  explore 
the  properties  of  x\  in  an  original  and  accessible  way.  Here  is  one  of  the 
anecdotes  Fowler  offers,  which  serves  as  an  enticing  clue  for  how  intricately  the 
gamma/factorial  function  is  connected  to  the  larger  mathematical  landscape. 

Recall  that  when  xl  is  extended  to  all  of  R  via  the  functional  equation 
x\  =  x(x  —  1)!  we  get  asymptotes  at  every  negative  integer.  Thus,  there  is  a 
compelling  reason  to  consider  the  reciprocal  function  l/x\  which  we  can  take  to 
be  zero  for  x  =  —1,  —2,  —3, .... 


Exercise  8.4.22.  (a)  Where  does  g{pc)  =  x\^_xyt  equal  zero?  What  other 

familiar  function  has  the  same  set  of  roots? 


The  function  e  x  provides  the  raw  material  for  the  all-important  Gaus- 

OO  2 

sian  bell  curve  from  probability,  where  it  is  known  that  f_ooe~x  dx  = 
yJF.  Use  this  fact  (and  some  standard  integration  techniques)  to  evaluate 
(1/2)!. 


(c)  Now  use  (a)  and  (b)  to  conjecture  a  striking  relationship  between  the 
factorial  function  and  a  well-known  function  from  trigonometry. 


Exercise  8.4.23.  As  a  parting  shot,  use  the  value  for  (1/2)!  and  the  Gauss 
product  formula  in  equation  (9)  to  derive  the  famous  product  formula  for  i r 
discovered  by  John  Wallis  in  the  1650s: 


7T 

2 


lim  (  —  )  (  —  \ 

rw oo  \l  -3/  V3‘5/ 


2  n  •  2  n 


(2n  —  l)(2n  +  1) 


8.5  Fourier  Series 

In  his  famous  treatise,  Theorie  Analytique  de  la  Chaleur  (The  Analytical  The¬ 
ory  of  Heat),  1822,  Joseph  Fourier  (1768-1830)  boldly  asserts,  “Thus  there  is 
no  function  /(#),  or  part  of  a  function,  which  cannot  be  expressed  by  a  trigono¬ 
metric  series.”2 3 4 

It  is  difficult  to  exaggerate  the  mathematical  richness  of  this  idea.  It  has  been 
convincingly  argued  by  mathematical  historians  that  the  ensuing  investigation 
into  the  validity  of  Fourier’s  conjecture  was  the  fundamental  catalyst  for  the 
pursuit  of  rigor  that  characterizes  19th  century  mathematics.  Power  series  had 
been  in  wide  use  in  the  150  years  leading  up  to  Fourier’s  work,  largely  because 
they  behaved  so  well  under  the  operations  of  calculus.  A  function  expressed 
as  a  power  series  is  continuous,  differentiable  an  infinite  number  of  times,  and 

2Exercise  8.4.1,  as  well  as  the  insight  of  comparing  the  development  of  xl  to  2X ,  are 
borrowed  from  this  piece. 

3Exercise  8.4.8  is  borrowed  from  Fowler’s  treatment  in  [15]. 

4 Quoted  passages  in  this  section  are  taken  from  [9]. 
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can  be  integrated  and  differentiated  as  though  it  were  a  polynomial.  In  the 
presence  of  such  agreeable  behavior,  there  was  no  compelling  reason  for  mathe¬ 
maticians  to  formulate  a  more  precise  understanding  of  “limit”  or  “convergence” 
because  there  were  no  arguments  to  resolve.  Fourier’s  successful  implementation 
of  trigonometric  series  to  the  study  of  heat  flow  changed  all  of  this.  To  under¬ 
stand  what  the  fuss  was  really  about,  we  need  to  look  more  closely  at  what 
Fourier  was  asserting,  focusing  individually  on  the  terms  “function,”  “express,” 
and  “trigonometric  series.” 


Trigonometric  Series 

The  basic  principle  behind  any  series  representations  is  to  express  a  given  func¬ 
tion  f(x)  as  a  sum  of  simpler  functions.  For  power  series,  the  component  func¬ 
tions  are  {1,  x,  x2,  x3, . . .},  so  that  the  series  takes  the  form 


oo 


f(x)  =  E 


anx 


—  Uq  T  Qj\X  ~b  CL2X  +  CI3X 


n= 0 


A  trigonometric  series  is  a  very  different  type  of  infinite  series  where  the  func¬ 
tions 

{1,  cos(x),  sin(x),  cos(2x),  sin(2x),  cos(3x),  sin(3x), . . .} 
serve  as  the  components.  Thus,  a  trigonometric  series  has  the  form 

f(x)  =  <20  +  a\  cos(x)  +  bi  sin(x)  +  a 2  cos(2x)  +  62  sin(2x)  +  as  cos(3x)  +  •  •  • 

00 

=  no  +  E  an  cos (nx)  +  bn  sin (nx). 

n= 1 

The  idea  of  representing  a  function  in  this  way  was  not  completely  new  when 
Fourier  first  publicly  proposed  it  in  1807.  About  50  years  earlier,  Jean  Le  Rond 
d’Alembert  (1717-1783)  published  the  partial  differential  equation 

,  .  d2u  d2u 

dx 2  dt 2 

as  a  means  of  describing  the  motion  of  a  vibrating  string.  In  this  model,  the 
function  u(x,  t)  represents  the  displacement  of  the  string  at  time  t  >  0  and  at 
some  point  x,  which  we  will  take  to  be  in  the  interval  [0, 7 r].  Because  the  string 
is  understood  to  be  attached  at  each  end  of  this  interval,  we  have 


(2) 


u(  0,£)=0  and  u(7r,t)  =  0 


for  all  values  oft  >  0.  Now,  at  t  =  0,  the  string  is  displaced  some  initial  amount, 
and  at  the  moment  it  is  released  we  assume 

du 


(3) 


dt 


(x,0)  =  0, 


meaning  that,  although  the  string  immediately  starts  to  move,  it  is  given  no 
initial  velocity  at  any  point.  Finding  a  function  u(x,t)  that  satisfies  equa¬ 
tions  (1),  (2),  and  (3)  is  not  too  difficult. 
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Exercise  8.5.1.  (a)  Verify  that 

u(x,  t )  =  bn  sin (nx)  cos (nt) 

satisfies  equations  (1),  (2),  and  (3)  for  any  choice  of  n  E  N  and  bn  G  R  . 
What  goes  wrong  if  n  £  N? 

(b)  Explain  why  any  finite  sum  of  functions  of  the  form  given  in  part  (a) 
would  also  satisfy  (1),  (2),  and  (3).  (Incidentally,  it  is  possible  to  hear 
the  different  solutions  in  (a)  for  values  of  n  up  to  4  or  5  by  isolating  the 
harmonics  on  a  well-made  stringed  instrument.) 

Now,  we  come  to  the  truly  interesting  issue.  We  have  just  seen  that  any 
function  of  the  form 


N 

(4)  u(x ,  t)  =  bn  sin  (nx)  cos  (nt) 

n— 1 

solves  d’Alembert’s  wave  equation ,  as  it  is  called,  but  the  particular  solution  we 
want  depends  on  how  the  string  is  originally  “plucked.”  At  time  t  =  0,  we  will 
assume  that  the  string  is  given  some  initial  displacement  f(x)  =  u(x,  0).  Setting 
t  =  0  in  our  family  of  solutions  in  (4),  the  hope  is  that  the  initial  displacement 
function  f(x)  can  be  expressed  as 


N 

(5)  f(x)  =  bn  sm(nx). 

n— 1 

What  this  means  is  that  if  there  exist  suitable  coefficients  i>i,  &2j  •  •  •  ?  &  w  so  that 
f(x)  can  be  written  as  a  sum  of  sine  functions  as  in  (5),  then  the  vibrating-string 
problem  is  completely  solved  by  the  function  u(x,t)  given  in  (4).  The  obvious 
question  to  ask,  then,  is  just  what  types  of  functions  can  be  constructed  as 
linear  combinations  of  the  functions  {sin(x),  sin(2x),  sin(3x), . . .}.  How  general 
can  f(x)  be?  Daniel  Bernoulli  (1700-1782)  is  usually  credited  with  proposing 
the  idea  that  by  taking  an  infinite  sum  in  equation  (5),  it  may  be  possible  to 
represent  any  initial  position  f(x)  over  the  interval  [0,  tt]  . 

Fourier  was  studying  the  propagation  of  heat  when  trigonometric  series 
resurfaced  in  his  work  in  a  very  similar  way.  For  Fourier,  f(x)  represented 
an  initial  temperature  applied  to  the  boundary  of  some  heat-conducting  mate¬ 
rial.  The  differential  equations  describing  heat  flow  are  slightly  different  from 
d’Alembert’s  wave  equation,  but  they  still  involve  the  second  derivatives  that 
make  expressing  f(x)  as  a  sum  of  trigonometric  functions  the  crucial  step  in 
finding  a  solution. 

Periodic  Functions 

In  the  early  stages  of  his  work,  Fourier  focused  his  attention  on  even  functions 
(i.e.,  functions  satisfying  f(x)  =  f(—x))  and  sought  out  ways  to  represent  them 
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Figure  8.2: 


as  series  of  the  form  J2an  cos(nx).  Eventually,  he  arrived  at  the  more  general 
formulation  of  the  problem,  which  is  to  find  suitable  coefficients  (an)  and  (bn) 
to  express  a  function  f(pc)  as 

oo 

(6)  f(x)  =  do  +  an  cos (nx)  +  bn  sin (nx). 

n= 1 


As  we  begin  to  explore  how  arbitrary  f(x)  can  be,  it  is  important  to  notice 
that  every  component  of  the  series  in  equation  (6)  is  periodic  with  period  2tt. 
Turning  our  attention  to  the  term  “function,”  it  now  follows  that  any  function 
we  hope  to  represent  by  a  trigonometric  series  will  necessarily  be  periodic  as 
well.  We  will  give  primary  attention  to  the  interval  ( — tt,  tt]  .  What  this  means 
is  that,  given  a  function  such  as  f(x)  =  r2,  we  will  restrict  our  attention  to  / 
over  the  domain  ( — tt,  tt]  and  then  extend  /  periodically  to  all  of  R  via  the  rule 
f(x)  =  f(x  +  2kn)  for  all  k  G  Z  (Fig.  8.2). 

This  convention  of  focusing  on  just  the  part  of  f(x)  over  the  interval  (— 7r,  tt 
hardly  seems  controversial,  but  it  did  generate  some  confusion  in  Fourier’s  time. 
In  Sections  1.2  and  4.1,  we  alluded  to  the  fact  that  in  the  early  1800s  the  term 
“function”  was  used  to  mean  something  more  like  “formula.”  It  was  generally 
believed  that  a  function’s  behavior  over  the  interval  ( — tt,  tt]  determined  its  be¬ 
havior  everywhere  else,  a  point  of  view  that  follows  naturally  from  an  overly 
zealous  faith  in  Taylor  series.  The  modern  definition  of  function  given  in  Def¬ 
inition  1.2.3  is  attributed  to  Dirichlet  from  the  1830s,  although  the  idea  had 
been  suggested  earlier  by  others.  In  Theorie  Analytique  de  la  Chaleur ,  Fourier 
clarifies  his  own  use  of  the  term  by  stating  that  a  “function  f(x)  represents  a 
succession  of  values  or  ordinates,  each  of  which  is  arbitrary. . .  We  do  not  sup¬ 
pose  these  ordinates  to  be  subject  to  a  common  law;  they  succeed  each  other  in 
any  matter  whatever,  and  each  of  them  is  given  as  if  it  were  a  single  quantity.” 

In  the  end,  we  will  need  to  make  a  few  assumptions  about  the  nature  of 
our  functions,  but  the  requirements  we  will  need  are  quite  mild,  especially 
when  compared  with  restrictions  such  as  “infinitely  differentiable,”  which  are 
necessary — but  not  sufficient — for  the  existence  of  a  Taylor  series  representation. 
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Types  of  Convergence 

This  brings  us  to  a  discussion  of  the  word  “expressed.”  The  assumptions  we 
must  ultimately  place  on  our  function  depend  on  the  kind  of  convergence  we 
aim  to  demonstrate.  How  are  we  to  understand  the  equal  sign  in  equation  (6)? 
Our  usual  course  of  action  with  infinite  series  is  first  to  define  the  partial  sum 


N 

(7)  Sn(x)  =  ao  +  E  an  cos (nx)  +  bn  sin (nx). 

n= 1 

To  “express  f(x)  as  a  trigonometric  series”  then  means  finding  coefficients 
(an)%L o  and  (bn)^=1  so  that 


f(x)  =  lim  Sn{x). 

N^-oo 


The  question  remains  as  to  what  kind  of  limit  this  is.  Fourier  probably  imagined 
something  akin  to  a  pointwise  limit  because  the  concept  of  uniform  convergence 
had  not  yet  been  formulated.  In  addition  to  pointwise  convergence  and  uniform 
convergence,  there  are  still  other  ways  to  interpret  the  limit  in  equation  (8). 
Although  it  won’t  be  discussed  here,  it  turns  out  that  proving 


Sn(x)  —  f(x)  | 2  dx  0 


is  a  natural  way  to  understand  equation  (8)  for  a  particular  class  of  functions. 
This  is  referred  to  as  L2  convergence.  An  alternate  type  of  convergence  that  we 
will  discuss,  called  Cesaro  mean  convergence ,  relies  on  demonstrating  that  the 
averages  of  the  partial  sums  converge,  in  our  case  uniformly,  to  f(x). 


Fourier  Coefficients 


In  the  discussion  that  follows,  we  are  going  to  need  a  few  calculus  facts. 

Exercise  8.5.2.  Using  trigonometric  identities  when  necessary,  verify  the  fol¬ 
lowing  integrals. 

(a)  For  all  n  e  N, 


/7T  p7 T 

cos  (nx)dx  =  0  and  /  sin  (nx)dx  =  0 

-7T  J  —  7T 


(b)  For  all  n  e  N, 


/7T  p7T 

cos  2(nx)dx  =  7 r  and  /  sin  2{nx)dx  =  i r 

-7T  J  —TV 
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(c)  For  all  m,  n  E  N, 


*7T 


cos (rax)  sin (nx)dx  =  0. 


-7T 


For  m  7^  n, 

/7T  /»7 T 

cos(rax)  cos(nx)dx  =  0  and  /  sin(rax)  sin(nx)dx  =  0. 

-7T  J  —  7T 


The  consequences  of  these  results  are  much  more  interesting  than  their 
proofs.  The  intuition  from  inner-product  spaces  is  useful.  Interpreting  the 
integral  as  a  kind  of  dot  product,  this  exercise  can  be  summarized  by  saying 
that  the  functions 


{1,  cos(x),  sin(x),  cos(2x),  sin(2x),  cos(3x), . . .  } 


are  all  orthogonal  to  each  other.  The  content  of  what  follows  is  that  they  in 
fact  form  a  basis  for  a  large  class  of  functions. 

The  first  order  of  business  is  to  deduce  some  reasonable  candidates  for  the 
coefficients  (an)  and  (6n)  in  equation  (6).  Given  a  function  /(x),  the  trick  is 
to  assume  we  are  in  possession  of  a  representation  described  in  (6)  and  then 
manipulate  this  equation  in  a  way  that  leads  to  formulas  for  (an)  and  ( bn ). 
This  is  exactly  how  we  proceeded  with  Taylor  series  expansions  in  Section  6.6. 
Taylor’s  formula  for  the  coefficients  was  produced  by  repeatedly  differentiating 
each  side  of  the  desired  representation  equation.  Here,  we  integrate. 

To  compute  ao,  integrate  each  side  of  equation  (6)  from  —i r  to  i r,  brazenly 
take  the  integral  inside  the  infinite  sum,  and  use  Exercise  8.5.2  to  get 


Thus, 

(9) 


*7 r 


-7T 


oo 


do  +  E  an  cos (nx)  +  bn  sin (nx) 


n—  1 
oo 


dx 


/7T  1*7 r 

a$dx  +  /  [an  cos  (nx)  +  bn  sin(nx)]  dx 

-7T  _ -i  J  —IT 


_7r  n— 1  J  ^ 

oo 

a0(27r)  +  ^  an0  +  bn 0  =  ao(27r), 

n— 1 


1 


*7 r 


ao  = 


2n 


f(x)dx 


-7T 


The  switching  of  the  sum  and  the  integral  sign  in  the  second  step  of  the  previous 
calculation  should  rightly  raise  some  eyebrows,  but  keep  in  mind  that  we  are 
really  working  backward  from  a  hypothetical  representation  for  /(x)  to  get  a 
proposal  for  what  ao  should  be.  The  point  is  not  to  justify  the  derivation  of  the 
formula  but  rather  to  show  that  using  this  value  for  ao  ultimately  gives  us  the 
representation  we  want.  That  hard  work  lies  ahead. 

Now,  consider  a  fixed  m  >  1.  To  compute  am,  we  first  multiply  each  side  of 
equation  (6)  by  cos  (mx)  and  again  integrate  over  the  interval  [— 7r,7r  . 
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Exercise  8.5.3.  Derive  the  formulas 


(10) 


a 


m 


1 


7 r 


/7T 

f  (x)  cos(mx) dx  and  b 

-7T 


1 


*7T 


m—  i  f(x)  sm(mx)dx 

7T  /  —  7r 


for  all  m  >  1. 

Let’s  take  a  short  break  and  empirically  test  our  recipes  for  (am)  and  (bm) 
on  a  few  simple  functions. 

Example  8.5.1.  Let 


1  if  0  <  X  <  7T 

0  if  x  =  0  or  x  =  tt 

—  1  if  — 7T  <  X  <  0. 


The  fact  that  /  is  an  odd  function  (i.e.,  f(—x)  =  —f(x))  means  we  can  avoid 
doing  any  integrals  for  the  moment  and  just  appeal  to  a  symmetry  argument  to 
conclude 


1 


a  o  — 


2tt 


r 

/  f(x)dx  =  0  and 

J  —  7T 


1 

CLn  — 


*7 r 


7 r 


f(x)  cos (nx)dx  =  0 


-7T 


for  all  n  >  1.  We  can  also  simplify  the  integral  for  bn  by  writing 


1 


7T 


47T 


/(x)  sin(nx)dx 


—  7T 


*7T 


7T 


sin(nrr)dx 


o 


2 
7 T 


cos(na;) 

4/n7r  if  n  is  odd 
0  if  n  is  even. 
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Proceeding  on  blind  faith,  we  plug  these  results  into  equation  (6)  to  get  the 
representation 

4  °°  1 

fix)  =  —  Y - sin((2n  +  l)x). 

Jy  J  7T  ^  2n  +  1  u  ;  ; 

n— 0 

A  graph  of  a  few  of  the  partial  sums  of  this  series  (Fig.  8.3)  should  generate 
some  optimism  about  the  legitimacy  of  what  is  happening. 


Exercise  8.5.4.  (a)  Referring  to  the  previous  example,  explain  why  we  can 

be  sure  that  the  convergence  of  the  partial  sums  to  f(x)  is  not  uniform 
on  any  interval  containing  0. 


Repeat  the  computations  of  Example  8.5.1  for  the  function  g(x)  =  \x 
and  examine  graphs  for  some  partial  sums.  This  time,  make  use  of  the 
fact  that  g  is  even  (g(x)  =  g(—x))  to  simplify  the  calculations.  By  just 
looking  at  the  coefficients,  how  do  we  know  this  series  converges  uniformly 
to  something? 


(c)  Use  graphs  to  collect  some  empirical  evidence  regarding  the  question  of 
term-by-term  differentiation  in  our  two  examples  to  this  point.  Is  it  pos¬ 
sible  to  conclude  convergence  or  divergence  of  either  differentiated  series 
by  looking  at  the  resulting  coefficients?  Theorem  6.4.3  is  about  the  legiti¬ 
macy  of  term-by-term  differentiation.  Can  it  be  applied  to  either  of  these 
examples? 


The  Riemann— Lebesgue  Lemma 

In  the  examples  we  have  seen  to  this  point,  the  sequences  of  Fourier  coefficients 
(an)  and  (bn)  all  tend  to  0  as  n  oo.  This  is  always  the  case.  Understanding 
why  this  happens  is  crucial  to  our  upcoming  convergence  proof. 

We  start  with  a  simple  observation.  The  reason 


sin  {x)dx  =  0 


is  that  the  positive  and  negative  portions  of  the  sine  curve  cancel  each  other 
out.  The  same  is  true  of 


sin  (nx)dx  =  0. 


Now,  when  n  is  large,  the  period  of  the  oscillations  of  sin (nx)  becomes  very 
short — 2ir/n  to  be  precise.  If  h{x)  is  a  continuous  function,  then  the  values 
of  h  do  not  vary  too  much  as  sin  (nx)  ranges  over  each  short  period.  The 
result  is  that  the  successive  positive  and  negative  oscillations  of  the  product 
h(x)  sin(nrr)  (Fig.  8.4)  are  nearly  the  same  size  so  that  the  cancellation  leads  to 
a  small  value  for 

p  7T 

/  h(x)  sm(nx) dx . 


—  TV 
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Theorem  8.5.2  (Riemann— Lebesgue  Lemma).  Assume  h(x)  is  continuous 
on  {— 7T,  7r] .  Then, 


h(x)  sin (nx)dx  0 


and 


h(x)  cos (nx)dx  0 


as  n 


00. 


Proof.  Remember  that,  like  all  of  our  functions  from  here  on,  we  are  mentally 
extending  h  to  be  27r-periodic.  Thus,  while  our  attention  is  generally  focused 
on  the  interval  ( — tt,  tt]  ,  the  assumption  of  continuity  is  intended  to  mean  that 
the  periodically  extended  h  is  continuous  on  all  of  R.  Note  that  in  addition  to 
continuity  on  ( — tt,  tt] ,  this  amounts  to  insisting  that  lim£C_>_7r+  h(x)  =  h( tt). 


Exercise  8.5.5.  Explain  why  h  is  uniformly  continuous  on  R. 

Given  e  >  0,  choose  S  >  0  such  that  \x  —  y\  <  S  implies  \h{x)  —  h(y ) |  <  e/2.  The 
period  of  sin (nx)  is  27r/n,  so  choose  N  large  enough  so  that  n/n  <  S  whenever 
n  >  N.  Now,  consider  a  particular  interval  [a,  b]  of  length  27r/n  over  which 
sin  (nx)  moves  through  one  complete  oscillation. 


Exercise  8.5.6.  Show  that 
complete  the  proof. 


fa  h(x)  sin(nx)dx  <  e/n,  and  use  this  fact  to 


□ 


Applications  of  Fourier  series  are  not  restricted  to  continuous  functions  (Ex¬ 
ample  8.5.1).  Even  though  our  particular  proof  makes  use  of  continuity,  the 
Rienrann-Lebesgue  lemma  holds  under  much  weaker  hypotheses.  It  is  true, 
however,  that  any  proof  of  this  fact  ultimately  takes  advantage  of  the  cancella¬ 
tion  of  positive  and  negative  components.  Recall  from  Chapter  2  that  this  type 
of  cancellation  is  the  mechanism  that  distinguishes  conditional  convergence  from 
absolute  convergence.  In  the  end,  what  we  discover  is  that,  unlike  power  series, 
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Fourier  series  can  converge  conditionally.  This  makes  them  less  robust,  perhaps, 
but  more  versatile  and  capable  of  more  interesting  behavior. 


A  Pointwise  Convergence  Proof 

Let’s  return  once  more  to  Fourier’s  claim  that  every  “function”  can  be  “ex¬ 
pressed”  as  a  trigonometric  series.  Our  recipe  for  the  Fourier  coefficients  in 
equations  (9)  and  (10)  implicitly  requires  that  our  function  be  integrable.  This 
is  the  major  motivation  for  Riemann’s  modification  of  Cauchy’s  definition  of 
the  integral.  Because  integrability  is  a  prerequisite  for  producing  a  Fourier  se¬ 
ries,  we  would  like  the  class  of  integrable  functions  to  be  as  large  as  possible. 
The  natural  question  to  ask  now  is  whether  Riemann  integrability  is  enough 
or  whether  we  need  to  make  some  additional  assumptions  about  /  in  order  to 
guarantee  that  the  Fourier  series  converges  back  to  /.  The  answer  depends  on 
the  type  of  convergence  we  hope  to  establish. 


oo 

f(x)  =  a0  +  E  an  cos (nx)  +  bn  sin (nx) 


pointwise  convergence 
uniform  convergence 
L2  convergence 
Cesaro  mean  convergence 


bounded 
integrable 
continuous 
differentiable 
f  continuous 


There  is  no  tidy  way  to  summarize  the  situation.  For  pointwise  convergence, 
integrability  is  not  enough.  At  present,  “integrable”  for  us  means  Riemann- 
integrable,  which  we  have  only  rigorously  defined  for  bounded  functions.  In 
1966,  Lennart  Carleson  proved  (via  an  extremely  complicated  argument)  that 
the  Fourier  series  for  such  a  function  converges  pointwise  at  every  point  in 
the  domain  excluding  possibly  a  set  of  measure  zero.  This  term  surfaced  in  our 
discussion  of  the  Cantor  set  (Section  3.1)  and  is  defined  rigorously  in  Section  7.6. 
Sets  of  measure  zero  are  small  in  one  sense,  but  they  can  be  uncountable,  and 
there  are  examples  of  continuous  functions  with  Fourier  series  that  diverge  at 
uncountably  many  points.  Lebesgue’s  modification  of  Riemann’s  integral  in 
1901  proved  to  be  a  much  more  natural  setting  for  Fourier  analysis.  Carleson’s 
proof  is  really  about  Lebesgue-integrable  functions  which  are  allowed  to  be 
unbounded  but  for  which  |/|2  is  finite.  One  of  the  cleanest  theorems  in 
this  area  states  that,  for  this  class  of  square  Lebesgue-integrable  functions,  the 
Fourier  series  always  converges  to  the  function  from  which  it  was  derived  if 
we  interpret  convergence  in  the  L2  sense  described  earlier.  As  a  final  warning 
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about  how  fragile  the  situation  is,  there  is  an  example  due  to  A.  Kolmogorov 
(1903-1987)  of  a  Lebesgue-integrable  function  where  the  Fourier  series  fails  to 
converge  at  any  point. 

Although  all  of  these  results  require  significantly  more  background  to  pursue 
in  any  rigorous  way,  we  are  in  a  position  to  prove  some  important  theorems  that 
require  a  few  extra  assumptions  about  the  function  in  question.  We  will  content 
ourselves  with  two  interesting  results  in  this  area. 


Theorem  8.5.3.  Let  f(x)  be  continuous  on  (— 7r,7r];  and  let  Sn(x)  be  the  Nth 
partial  sum  of  the  Fourier  series  described  in  equation  (7),  where  the  coefficients 
(an)  and  (bn)  are  given  by  equations  (9)  and  (10).  It  follows  that 


lim 

TV— )>oo 


Sn(x)  =  f(x) 


pointwise  at  any  x  G  (■ 


7 r,  7 r 


where  f'(x)  exists. 


Proof.  Cataloging  a  few  preliminary  facts  makes  for  a  smoother  argument. 


Fact  1:  (a)  cos(o  —  6)  =  cos(a)  cos(0)  +  sin(a)  sin(0). 
(b)  sin(<a  +  6)  =  sin(a)  cos(0)  +  cos(o)  sin (6). 

Fact  2:  |  +  cos(0)  +  cos(20)  +  cos(30)  +  •  •  •  +  cos  {NO) 
any  0  2nir. 


sin(CV +  1/2)0) 
2sin(6>/2) 


Facts  1(a)  and  1(b)  are  familiar  trigonometric  identities.  Fact  2  is  not  as 
familiar.  Its  proof  (which  we  omit)  is  most  easily  derived  by  taking  the  real  part 
of  a  geometric  sum  of  complex  exponentials.  The  function  in  Fact  2  is  called  the 
Dirichlet  kernel  in  honor  of  the  mathematician  responsible  for  the  first  rigorous 
convergence  proof  of  this  kind.  Integrating  both  sides  of  this  identity  leads  to 
our  next  important  fact. 


Fact  3:  Setting 


Dn{6) 


from  Fact  2,  we  see  that 


sin((7V+l/2)6>) 
2  sin(0/2) 

1/2  +  TV, 


if  0  f  2nn 
if  6  =  2nn 


DN(0)d0 


—  7 T. 


Although  we  will  not  restate  it,  the  last  fact  we  will  use  is  the  Riemann- 
Lebesgue  Lemma. 

Fix  a  point  x  E  ( — tt,  tt]  .  The  first  step  is  to  simplify  the  expression  for 
Sn{x).  Now  x  is  a  fixed  constant  at  the  moment,  so  we  will  write  the  integrals 
in  equations  (9)  and  (10)  using  t  as  the  variable  of  integration.  Keeping  an  eye 
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on  Facts  1(a)  and  (2),  we  get  that 


N 


Sn(x)  =  Qq  +  an  cos (nx)  +  bn  sin (nx) 


n— 1 

l  ^ 


2ir 


f(t)dt 


—  7T 


N  r 


+  £ 


1 


*7 T 


n— 1  L 

N  r 


7 r 


f{t)  cos (nt)dt 


—  7 T 


cos  (nx) 


1 

7 r 

1 

7T 

1 

7T 


*7T 

—  7T 

*7T 

—  7T 
*7 r 

—  7T 


/W 


/(*) 


1 


n=l  L 
N 


1 


7T 


*  7T 


/(t)  sin(nt)dt 


—  7T 


sin(nx) 


— |-  cos(nt)  cos(nx)  +  sin  (nt)  sin  (nx) 


1 


n = 1 
N 


dt 


— b  cos  (nt  —  nx) 


n=l 

f(t)DN(t  —  x)dt. 


dt 


As  one  final  simplification,  let  u  =  t  —  x.  Then, 


SN(x) 


1 

7 T 


f(u  +  x)D]y(u)du 


1 

7T 


f(u  +  x)D]y(u)du. 


The  last  equality  is  a  result  of  our  agreement  to  extend  /  to  be  27r-periodic. 
Because  is  also  periodic  (it  is  the  sum  of  cosine  functions),  it  does  not 
matter  over  what  interval  we  compute  the  integral  as  long  as  we  cover  exactly 
one  full  period. 

To  prove  SA(x)  —>  /(x),  we  must  show  that  |SA(x)  —  /(x)|  gets  arbitrarily 
small  when  N  gets  large.  Having  expressed  Sn{%)  as  an  integral  involving 
Dn{u),  we  are  motivated  to  do  a  similar  thing  for  /(x).  By  Fact  3, 


i  r  i  r 

/(x)  =  /(x)—  /  DN(u)du  =  —  /(x)JDAr(n)dn, 
^  J  — 7T  J  —  7T 


and  it  follows  that 

(11)  SN(x)  -  f(x)  =  -  f  (f(u  +  x)  -  f{x))DN(u)du. 

■  ./  r 

Our  goal  is  to  show  this  quantity  tends  to  zero  as  TV  — oo.  A  sketch  of 
Dn{u)  (Fig.  8.5)  for  a  few  values  of  N  reveals  why  this  might  happen.  For  large 
TV,  the  Dirichlet  kernel  Dn{u)  has  a  tall,  thin  spike  around  u  =  0,  but  this  is 
precisely  where  f(u  +  x)  —  f(x)  is  small  (because  /  is  continuous).  Away  from 
zero,  Dn{u)  exhibits  the  fast  oscillations  that  hearken  back  to  the  Riemann- 
Lebesgue  Lemma  (Theorem  8.5.2).  Let’s  see  how  to  use  this  theorem  to  finish 
the  argument. 
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Figure  8.5:  Dq(u)  and  D\§(u). 


Using  Fact  1(b),  we  can  rewrite  the  Dirichlet  kernel  as 


Dn(u) 


sin((7V  +  l/2)u)  1 

2  sin(u/2)  _  2 


sin(Nu)  cos(tx/2) 
sin(u/2) 


+  cos  (Nu) 


Then,  equation  (11)  becomes 


Sn(x)  -  f(x) 


1 


27 r 

1 

27T 


1 


27T 


*7T 

-7T 
*7 r 

-7T 

*7T 

—  7T 


(/(«  +  x)  -  /(x)) 


(/(w  +  x)  -  f(x)) 


sin(Nu)  cos(u/2 ) 
sin(ix/2) 

sin(Ad/)  cos(u/2) 


+  cos(Nu) 


du 


px(u)  sin (Nu)du  + 


sin(iz/2) 

+  (/(u  +  x)  —  /(#))  cos (Nu)du 

1 


27T 


qx(u)  cos (Nu)du, 


—  7 r 


where  in  the  last  step  we  have  set 

/  X  ( f(u  +  x)  -  f(x))  cost u/ 2)  .  ,  .  X  X 

Px(w)  =  - .  ,  /0x -  and  qx(u)  =  f(u  +  x)  -  f(x). 

sm(u/2) 

Exercise  8.5.7.  (a)  First,  argue  why  the  integral  involving  qx(u )  tends  to 

zero  as  N  oo. 


(b)  The  first  integral  is  a  little  more  subtle  because  the  function  px(u)  has  the 
sin(u/2)  term  in  the  denominator.  Use  the  fact  that  /  is  differentiable  at 
x  (and  a  familiar  limit  from  calculus)  to  prove  that  the  first  integral  goes 
to  zero  as  well.  □ 


This  completes  the  argument  that  Sn{%)  f(x)  at  any  point  x  where 
/  is  differentiable.  If  the  derivative  exists  everywhere,  then  we  get  Sn  f 
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pointwise.  If  we  add  the  assumption  that  f  is  continuous,  then  it  is  not  too 
difficult  to  show  that  the  convergence  is  uniform.  In  fact,  there  is  a  very  strong 
relationship  between  the  speed  of  convergence  of  the  Fourier  series  and  the 
smoothness  of  /.  The  more  derivatives  /  possesses,  the  faster  the  partial  sums 
Sn  converge  to  /. 


Cesaro  Mean  Convergence 

Rather  than  pursue  the  proofs  in  this  interesting  direction,  we  will  finish  this 
very  brief  introduction  to  Fourier  series  with  a  look  at  a  different  type  of  con¬ 
vergence  called  Cesaro  mean  convergence. 

Exercise  8.5.8.  Prove  that  if  a  sequence  of  real  numbers  (xn)  converges,  then 
the  arithmetic  means 

X\  ~\~  X2  T  x%  +  •  •  •  +  xn 


also  converge  to  the  same  limit.  Give  an  example  to  show  that  it  is  possible  for 
the  sequence  of  means  (yn)  to  converge  even  if  the  original  sequence  (xn)  does 
not. 


The  discussion  preceding  Theorem  8.5.3  is  intended  to  create  a  kind  of  rev¬ 
erence  for  the  difficulties  inherent  in  deciphering  the  behavior  of  Fourier  series, 
especially  in  the  case  where  the  function  in  question  is  not  differentiable.  It  is 
from  this  humble  frame  of  mind  that  the  following  elegant  result  due  to  L.  Fejer 
in  1904  can  best  be  appreciated. 


Theorem  8.5.4  (Fejer’s  Theorem).  Let  Sn(x)  be  the  nth  partial  sum  of  the 
Fourier  series  for  a  function  f  on  (—7 r,  7 r].  Define 


cr  n(x) 


1 

NT  1 


N 

yZ  Sn(x). 

n= 0 


If  f  is  continuous  on  (— 7r,7r],  then  cfn{x)  f(x)  uniformly. 


Proof  This  argument  is  patterned  after  the  proof  of  Theorem  8.5.3  but  is  ac¬ 
tually  much  simpler.  In  addition  to  the  trigonometric  formulas  listed  in  Facts 
1  and  2,  we  are  going  to  need  a  version  of  Fact  2  for  the  sine  function,  which 
looks  like 


sin(0)  +  sin(2$)  +  sin(3$)  +  •  •  •  +  sin  (NO) 


sin  (^)  sin  ((TV  +  1)|) 
sm  (f  ) 


Exercise  8.5.9.  Use  the  previous  identity  to  show  that 


1/2  +  D^O)  +  D2(9)  +  •  •  •  +  Dn(6) 


1 


sin  (( N  +  1)|) 
sin  (f ) 


N+l 


2(N  +  1) 
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The  expression  in  Exercise  8.5.9  is  called  the  Fejer  kernel  and  will  be  de¬ 
noted  by  Fjst(6).  Analogous  to  the  Dirichlet  kernel  Dn{0)  from  the  proof  of 
Theorem  8.5.3,  Fn  is  used  to  greatly  simplify  the  formula  for  cfn(x ). 

Exercise  8.5.10.  (a)  Show  that 

i  r 

ctn(x)  =  —  /  f(u  +  x)F]sr(u)du. 

^  J  —  7T 


(b)  Graph  the  function  Fn(u)  for  several  values  of  N.  Where  is  Fn  large, 
and  where  is  it  close  to  zero?  Compare  this  function  to  the  Dirichlet 
kernel  Dn(u).  Now,  prove  that  Fn  0  uniformly  on  any  set  of  the  form 


{u  :  | u 

(  — 7T,  7T]). 


>  (5},  where  5  >  0  is  fixed  (and  u  is  restricted  to  the  interval 


(c)  Prove  that  Fn  (u)  du  =  tt 


(d)  To  finish  the  proof  of  Fejer’s  Theorem,  first  choose  a  S  >  0  so  that 


u 


<  6  implies 


f(x  +  u)  —  f(x)  |  <  e. 


Set  up  a  single  integral  that  represents  the  difference  c tn{% )  —  f(x)  and 
divide  this  integral  into  sets  where  \u\  <  S  and  \u\  >  S.  Explain  why  it  is 
possible  to  make  each  of  these  integrals  sufficiently  small,  independently 
of  the  choice  of  x.  □ 


Weierstrass  Approximation  Theorem 

The  hard  work  of  proving  Fejer’s  Theorem  has  many  rewards,  one  of  which 
is  access  to  a  relatively  short  argument  for  a  profoundly  important  theorem 
discovered  by  Weierstrass  in  1885.  The  Weierstrass  Approximation  Theorem 
(WAT)  is  studied  in  depth  in  Section  6.7  and  is  restated  here  for  ease  of  reference. 

Theorem  6.7.1  (Weierstrass  Approximation  Theorem).  Let  f  :  [a,  b\  -V 

R  be  continuous.  Given  e  >  0,  there  exists  a  polynomial  p(x)  satisfying 


f  (x)  -p(x) 


<  e 


for  all  x  G  [a,  b] 


Proof.  We  have  actually  seen  a  few  special  cases  of  this  result  before  in  Sec¬ 
tion  6.6  on  Taylor  series.  For  instance,  we  showed  that 


rr»  ^  ry*  ^ 

.  /  \  T  *Aj 

sm(x)  =  x  —  —  + 


7  9 

ry  l  ry  ^ 

+ 


3!  5!  7!  9! 


where  the  series  converges  uniformly  on  any  bounded  subset  of  R.  Uniform 
convergence  of  a  series  means  the  partial  sums  converge  uniformly,  and  the 
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partial  sums  in  this  case  are  polynomials.  Notice  that  this  is  precisely  what 
WAT  asks  us  to  prove,  only  we  must  do  it  for  an  arbitrary,  continuous  function 
in  place  of  sin(x). 

Using  Taylor  series  does  not  work  in  general.  To  construct  a  Taylor  series 
we  need  the  function  to  be  infinitely  differentiable — not  just  continuous — and 
even  in  this  case  we  might  get  a  series  that  either  does  not  converge  or  converges 
to  the  wrong  thing.  Taylor  series  are  a  valuable  tool,  however.  In  Section  6.7 
we  used  the  Taylor  series  for  y/1  —  x  as  the  starting  point  for  a  proper  proof 
of  WAT.  Fejer’s  Theorem,  in  conjunction  with  the  Taylor  series  for  sin(x)  and 
cos(x),  provides  a  significant  shortcut  to  the  same  result. 

Exercise  8.5.11.  (a)  Use  the  fact  that  the  Taylor  series  for  sin(x)  and  cos(x) 

converge  uniformly  on  any  compact  set  to  prove  WAT  under  the  added 
assumption  that  [a,  b]  is  [0,  tt]  . 


(b)  Show  how  the  case  for  an  arbitrary  interval  [a,  b }  follows  from  this  one. 


□ 


A  comment  from  Section  6.7  that  bears  repeating  relates  to  the  striking 
contrast  between  this  result  and  Weierstrass’s  demonstration  of  a  continuous 
nowhere-differentiable  function.  Although  there  exist  continuous  functions  that 
oscillate  so  wildly  that  they  fail  to  have  a  derivative  at  any  point,  these  unruly 
functions  are  always  uniformly  within  e  of  an  infinitely  differentiable  polynomial. 


Approximation  as  a  Unifying  Theme 

Viewing  the  last  section  of  this  chapter  as  a  kind  of  appendix  (included  to 
clear  up  some  loose  ends  from  Chapter  1  regarding  the  definition  of  the  real 
numbers),  the  Weierstrass’  Approximation  Theorem  makes  for  a  fitting  close  to 
our  introductory  survey  of  some  of  the  gems  of  analysis. 

The  idea  of  approximation  permeates  the  entire  subject.  Every  real  num¬ 
ber  can  be  approximated  with  rational  ones.  The  value  of  an  infinite  sum  is 
approximated  with  partial  sums,  and  the  value  of  a  continuous  function  can 
be  approximated  with  its  values  nearby.  A  function  is  differentiable  when  a 
straight  line  is  a  good  approximation  to  the  curve,  and  it  is  integrable  when 
finite  sums  of  rectangles  are  a  good  approximation  to  the  area  under  the  curve. 
Now,  we  learn  that  every  continuous  function  can  be  approximated  arbitrarily 
well  with  a  polynomial.  In  every  case,  the  approximating  objects  are  tangi¬ 
ble  and  well-understood,  and  the  issue  is  how  well  these  properties  survive  the 
limiting  process.  By  viewing  the  different  infinities  of  mathematics  through 
pathways  crafted  out  of  finite  objects,  Weierstrass  and  the  other  founders  of 
analysis  created  a  paradigm  for  how  to  extend  the  scope  of  mathematical  explo¬ 
ration  deep  into  territory  previously  unattainable.  Although  our  journey  ends 
here,  the  road  is  long  and  continues  to  be  written. 
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8.6  A  Construction  of  R  From  Q 

This  entire  section  is  devoted  to  constructing  a  proof  for  the  following  theorem: 

Theorem  8.6.1  (Existence  of  the  Real  Numbers).  There  exists  an  ordered 
field  in  which  every  nonempty  set  that  is  bounded  above  has  a  least  upper  bound. 
In  addition,  this  field  contains  Q  as  a  subfield. 

There  are  a  few  terms  to  define  before  this  statement  can  be  properly  under¬ 
stood  and  proved,  but  it  can  essentially  be  paraphrased  as  “the  real  numbers 
exist.”  In  Section  1.1,  we  encountered  a  major  failing  of  the  rational  number 
system  as  a  place  to  do  analysis.  Without  the  square  root  of  2  (and  uncount- 
ably  many  other  irrational  numbers)  we  cannot  confidently  move  from  a  Cauchy 
sequence  to  its  limit  because  in  Q  there  is  no  guarantee  that  such  a  number  ex¬ 
ists.  (A  review  of  Sections  1.1  and  1.3  is  highly  recommended  at  this  point.) 
The  resolution  we  proposed  in  Chapter  1  came  in  the  form  of  the  Axiom  of 
Completeness,  which  we  restate. 

Axiom  of  Completeness.  Every  nonempty  set  of  real  numbers  that  is  bounded 
above  has  a  least  upper  bound. 

Now  let’s  be  clear  about  how  we  actually  proceeded  in  Chapter  1.  This  is 
the  property  that  distinguishes  Q  from  R,  but  by  referring  to  this  property  as 
an  axiom  we  were  making  the  point  that  it  was  not  something  to  be  proved. 
The  real  numbers  were  defined  simply  as  an  extension  of  the  rational  numbers 
in  which  bounded  sets  have  least  upper  bounds,  but  no  attempt  was  made  to 
demonstrate  that  such  an  extension  is  actually  possible.  Now,  the  time  has 
finally  come.  By  explicitly  building  the  real  numbers  from  the  rational  ones,  we 
will  be  able  to  demonstrate  that  the  Axiom  of  Completeness  does  not  need  to 
be  an  axiom  at  all;  it  is  a  theorem! 

There  is  something  ironic  about  having  the  final  section  of  this  book  be 
a  construction  of  the  number  system  that  has  been  the  underlying  subject  of 
every  preceding  page,  but  there  is  something  perfectly  apt  about  it  as  well. 
Through  eight  chapters  stretching  from  Cantor’s  Theorem  to  the  Baire  Category 
Theorem,  we  have  come  to  see  how  profoundly  the  addition  of  completeness 
changes  the  landscape.  We  all  grow  up  believing  in  the  existence  of  real  numbers, 
but  it  is  only  through  a  study  of  classical  analysis  that  we  become  aware  of  their 
elusive  and  enigmatic  nature.  It  is  because  completeness  matters  so  much,  and 
because  it  is  responsible  for  such  perplexing  phenomena,  that  we  should  now 
feel  obliged — compelled  really — to  go  back  to  the  beginning  and  verify  that  such 
a  thing  really  exists. 

As  we  mentioned  in  Chapter  1,  proceeding  in  this  order  puts  us  in  good 
historical  company.  The  pioneering  work  of  Cauchy,  Bolzano,  Abel,  Dirichlet, 
Weiestrass,  and  Riemann  preceded — and  in  a  very  real  sense  led  to — the  host 
of  rigorous  definitions  for  R  that  were  proposed  in  the  last  half  of  the  19th 
century.  Georg  Cantor  is  a  familiar  name  responsible  for  one  of  these  definitions, 
but  alternate  constructions  of  the  real  number  system  also  came  from  Charles 
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Meray  (1835-1911),  Eduard  Heine  (1821-1881),  and  Richard  Dedekind  (1831— 
1916).  The  formulation  that  follows  is  the  one  due  to  Dedekind.  In  a  sense  it 
is  the  most  abstract  of  the  approaches,  but  it  is  the  most  appropriate  for  us 
because  the  verification  of  completeness  is  done  in  terms  of  least  upper  bounds. 

Dedekind  Cuts 

We  begin  this  discussion  by  assuming  that  the  rational  numbers  and  all  of  the 
familiar  properties  of  addition,  multiplication,  and  order  are  available  to  us.  At 
the  moment,  there  is  no  such  thing  as  a  real  number. 

Definition  8.6.2.  A  subset  A  of  the  rational  numbers  is  called  a  cut  if  it 
possesses  the  following  three  properties: 

(cl)  A  7^  0  and  A  ^  Q. 

(c2)  If  r  G  A,  then  A  also  contains  every  rational  q  <  r. 

(c3)  A  does  not  have  a  maximum;  that  is,  if  r  E  A,  then  there  exists  s  G  A 
with  r  <  s. 

Exercise  8.6.1.  (a)  Fix  r  G  Q.  Show  that  the  set  Cr  =  {t  E  Q  :  t  <  r}  is  a 

cut. 

The  temptation  to  think  of  all  cuts  as  being  of  this  form  should  be  avoided. 
Which  of  the  following  subsets  of  Q  are  cuts? 

(b)  S  =  {t  G  Q  :  t  <  2} 

(c)  T  =  {t  G  Q  :  t2  <  2  or  t  <  0} 

(d)  U  =  {t  G  Q  :  t2  <  2  or  t  <  0} 

Exercise  8.6.2.  Let  A  be  a  cut.  Show  that  if  r  E  A  and  s  ^  A,  then  r  <  s. 

To  dispel  any  suspense,  let’s  get  right  to  the  point. 

Definition  8.6.3.  Define  the  real  numbers  R  to  be  the  set  of  all  cuts  in  Q. 

This  may  feel  awkward  at  first — real  numbers  should  be  numbers,  not  sets 
of  rational  numbers.  The  counterargument  here  is  that  when  working  on  the 
foundations  of  mathematics,  sets  are  about  the  most  basic  building  blocks  we 
have.  We  have  defined  a  set  R  whose  elements  are  subsets  of  Q.  We  now  must 
set  about  the  task  of  imposing  some  algebraic  structure  on  R  that  behaves  in 
a  way  familiar  to  us.  What  exactly  does  this  entail?  If  we  are  serious  about 
constructing  a  proof  for  Theorem  8.6.1,  we  need  to  be  more  specific  about  what 
we  mean  by  an  “ordered  field.” 
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Field  and  Order  Properties 

Given  a  set  F  and  two  elements  x,y  <E  F,  an  operation  on  F  is  a  function  that 
takes  the  ordered  pair  (x,  y )  to  a  third  element  z  G  F.  Writing  x  +  y  or  xy 
to  represent  different  operations  reminds  us  of  the  two  operations  that  we  are 
trying  to  emulate. 

Definition  8.6.4.  A  set  F  is  a  field  if  there  exist  two  operations — addition 
{x  +  y)  and  multiplication  {xy) — that  satisfy  the  following  list  of  conditions: 

(fl)  (commutativity)  x  +  y  =  y  +  x  and  xy  =  yx  for  all  x,y  G  F. 

(f2)  (associativity)  {x  +  y)  +  z  =  xF{yFz)  and  {xy)z  =  x{yz)  for  all  x,y,z  G  F. 

(f3)  (identities  exist)  There  exist  two  special  elements  0  and  1  with  0^1  such 
that  x  +  0  =  x  and  xl  =  x  for  all  x  G  F. 

(f4)  (inverses  exist)  Given  x  G  F,  there  exists  an  element  —  x  G  F  such  that 
x  +  (~x)  =  0.  If  x  7^  0,  there  exists  an  element  x~x  such  that  xx~l  =  1. 

(f5)  (distributive  property)  x{y  +  z)  =  xy  +  xz  for  all  x,y,z  G  F . 

Exercise  8.6.3.  Using  the  usual  definitions  of  addition  and  multiplication, 
determine  which  of  these  properties  are  possessed  by  N,  Z,  and  Q,  respectively. 

Although  we  will  not  pursue  this  here  in  any  depth,  all  of  the  familiar  al¬ 
gebraic  manipulations  in  Q  (e.g.,  x  +  y  =  x  +  z  implies  y  =  z)  can  be  derived 
from  this  short  list  of  properties. 

Definition  8.6.5.  An  ordering  on  a  set  F  is  a  relation,  represented  by  <,  with 
the  following  three  properties: 

(ol)  For  arbitrary  x,  y  G  F,  at  least  one  of  the  statements  x  <  y  or  y  <  x  is 
true. 

(o2)  If  x  <  y  and  y  <  x,  then  x  =  y. 

(o3)  If  x  <  y  and  y  <  z,  then  x  <  z. 

We  will  sometimes  write  y  >  x  in  place  oix  <  y.  The  strict  inequality  x  <  y 
is  used  to  mean  x  <  y  but  x  =/=■  y. 

A  field  F  is  called  an  ordered  field  if  F  is  endowed  with  an  ordering  <  that 
satisfies 

(o4)  If  y  <  z,  then  x  +  y  <  x  +  z. 

(o5)  If  x  >  0  and  y  >  0,  then  xy  >  0. 

Let’s  take  stock  of  where  we  are.  To  prove  Theorem  8.6.1,  we  are  accepting 
as  given  that  the  rational  numbers  are  an  ordered  field.  We  have  defined  the  real 
numbers  R  to  be  the  collection  of  cuts  in  Q,  and  the  challenge  now  is  to  invent 
addition,  multiplication,  and  an  ordering  so  that  each  possesses  the  properties 
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outlined  in  the  preceding  two  definitions.  The  easiest  of  these  is  the  ordering. 
Let  A  and  B  be  two  arbitrary  elements  of  R. 

Define  A  <  B  to  mean  A  C  B. 

Exercise  8.6.4.  Show  that  this  defines  an  ordering  on  R  by  verifying  properties 
(ol),  (o2),  and  (o3)  from  Definition  8.6.5. 

Algebra  in  R 

Given  A  and  B  in  R,  define 

A  T  B  =  jfl  "f  b  \  a  A  and  b  E  B  j*. 

Before  checking  properties  (fl)-(f4)  for  addition,  we  must  first  verify  that  our 
definition  really  defines  an  operation.  Is  A  +  B  actually  a  cut?  To  get  the  flavor 
of  how  these  arguments  look,  let’s  verify  property  (c2)  of  Definition  8.6.2  for 
the  set  A  +  B. 

Let  a  +  b  E  A  +  B  be  arbitrary  and  let  s  E  Q  satisfy  s  <  a  +  b.  Then, 
s  —  b  <  a,  which  implies  that  s  —  b  E  A  because  A  is  a  cut.  But  then 

s  =  (s  —  b)  +  b  E  A  +  B, 


and  (c2)  is  proved. 

Exercise  8.6.5.  (a)  Show  that  (cl)  and  (c3)  also  hold  for  A  +  B.  Conclude 

that  A  +  B  is  a  cut. 

(b)  Check  that  addition  in  R  is  commutative  (fl)  and  associative  (f2). 

(c)  Show  that  property  (o4)  holds. 

(d)  Show  that  the  cut 

O  =  {p  E  Q  :  p  <  0} 

successfully  plays  the  role  of  the  additive  identity  (f3).  (Showing  A  +  O  = 
A  amounts  to  proving  that  these  two  sets  are  the  same.  The  standard 
way  to  prove  such  a  thing  is  to  show  two  inclusions:  A  +  O  C  A  and 
A  C  A  TO.) 

What  about  additive  inverses?  Given  A  E  R,  we  must  produce  a  cut  —A 
with  the  property  that  A  +  (— A)  =  O.  This  is  a  bit  more  difficult  than  it 
sounds.  Conceptually,  the  cut  —A  consists  of  all  rational  numbers  less  than 
—  sup  A.  The  problem  is  how  to  define  this  set  without  using  suprema,  which 
are  strictly  off  limits  at  the  moment.  (We  are  building  the  field  in  which  they 
exist!) 

Given  4eR,  define 

—A  =  {r  E  Q  :  there  exists  t  £  A  with  t  <  —  r}. 
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A  t 


r  0  — r 

V - V - ' 

—  A 

Exercise  8.6.6.  (a)  Prove  that  —A  defines  a  cut. 

(b)  What  goes  wrong  if  we  set  —A  =  {r  E  Q  :  —  r  ^  A }? 

(c)  If  a  E  A  and  r  E  —A,  show  a  +  r  E  O.  This  shows  A  +  (—A)  C  O.  Now, 
finish  the  proof  of  property  (f4)  for  addition  in  Definition  8.6.4. 

Although  the  ideas  are  similar,  the  technical  difficulties  increase  when  we 
try  to  create  a  definition  for  multiplication  in  R.  This  is  largely  due  to  the  fact 
that  the  product  of  two  negative  numbers  is  positive.  The  standard  method  of 
attack  is  first  to  define  multiplication  on  the  non- negative  cuts. 

Given  A  >  O  and  B  >  O  in  R,  define  the  product 


AB  =  {ab  :  a  E  A,  b  E  B  with  a,  b  >  0}  U  {q  E  Q  :  q  <  0}. 


Exercise  8.6.7.  (a)  Show  that  AB  is  a  cut  and  that  property  (o5)  holds. 

(b)  Propose  a  good  candidate  for  the  multiplicative  identity  (1)  on  R  and 
show  that  this  works  for  all  cuts  A  >  O. 


(c)  Show  the  distributive  property  (f5)  holds  for  non-negative  cuts. 


Products  involving  at  least  one  negative  factor  can  be  defined  in  terms  of  the 
product  of  two  positive  cuts  by  observing  that  —  A  >  0  whenever  A  <  O.  (Given 
A  <  O,  property  (o4)  implies  A  +  (—A)  <  O  +  (—A),  which  yields  O  <  —A.) 
For  any  A  and  B  in  R,  define 


AB 


as  given 
-IM-B)] 
-K-A)b] 
( -A)(-B ) 


if  A  >  O  and  B  >  O 
if  A  >  O  and  B  <  O 
if  A  <  O  and  B  >  O 
if  A  <  O  and  B  <  O. 


Verifying  that  multiplication  defined  in  this  way  satisfies  all  the  required  field 
properties  is  important  but  uneventful.  The  proofs  generally  fall  into  cases  for 
when  terms  are  positive  or  negative  and  follow  a  pattern  similar  to  those  for 
addition.  We  will  leave  them  as  an  unofficial  exercise  and  move  on  to  the  punch 
line. 


Least  Upper  Bounds 

Having  proved  that  R  is  an  ordered  field,  we  now  set  our  sights  on  showing 
that  this  field  is  complete.  We  defined  completeness  in  Chapter  1  in  terms  of 
least  upper  bounds.  Here  is  a  summary  of  the  relevant  definitions  from  that 
discussion. 
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Definition  8.6.6.  A  set  iCRis  bounded  above  if  there  exists  a  B  G  R  such 
that  A  <  B  for  all  A  G  A.  The  number  B  is  called  an  upper  bound  for  A. 

A  real  number  S'  G  R  is  the  least  upper  bound  for  a  set  A  C  R  if  it  meets 
the  following  two  criteria: 

(i)  S  is  an  upper  bound  for  A  and 

(ii)  if  B  is  any  upper  bound  for  A,  then  S  <  B. 

Exercise  8.6.8.  Let  A  C  R  be  nonempty  and  bounded  above,  and  let  S  be 
the  union  of  all  A  G  A. 

(a)  First,  prove  that  S  G  R  by  showing  that  it  is  a  cut. 

(b)  Now,  show  that  S  is  the  least  upper  bound  for  A. 

This  finishes  the  proof  that  R  is  complete.  Notice  that  we  could  have  proved 
that  least  upper  bounds  exist  immediately  after  defining  the  ordering  on  R,  but 
saving  it  for  last  gives  it  the  privileged  place  in  the  argument  it  deserves.  There 
is,  however,  still  one  loose  end  to  sew  up.  The  statement  of  Theorem  8.6.1 
mentions  that  our  complete  ordered  field  contains  Q  as  a  subfield.  This  is  a 
slight  abuse  of  language.  What  it  should  say  is  that  R  contains  a  subfield  that 
looks  and  acts  exactly  like  Q. 

Exercise  8.6.9.  Consider  the  collection  of  so-called  “rational”  cuts  of  the  form 

Cr  =  {t  G  Q  :  t  <  r} 
where  r  G  Q.  (See  Exercise  8.6.1.) 

(a)  Show  that  Cr  +  Cs  =  Cr+S  for  all  r,  s  G  Q.  Verify  CrCs  =  Crs  for  the 
case  when  r,  s  >  0. 

(b)  Show  that  Cr  <  Cs  if  and  only  if  r  <  s  in  Q. 

Cantor’s  Approach 

As  a  way  of  giving  Georg  Cantor  the  last  word,  let’s  briefly  look  at  his  very 
different  approach  to  constructing  R  out  of  Q.  One  of  the  many  equivalent 
ways  to  characterize  completeness  is  with  the  assertion  that  “Cauchy  sequences 
converge.”  Given  a  Cauchy  sequence  of  rational  numbers,  we  are  now  well  aware 
that  this  sequence  may  converge  to  a  value  not  in  Q.  Just  as  before,  the  goal  is 
to  create  something,  which  we  will  call  a  real  number ,  that  can  serve  as  the  limit 
of  this  sequence.  Cantor’s  idea  was  essentially  to  define  a  real  number  to  be  the 
entire  Cauchy  sequence.  The  first  problem  one  encounters  with  this  approach 
is  the  realization  that  two  different  Cauchy  sequences  can  converge  to  the  same 
real  number.  For  this  reason,  the  elements  in  R  are  more  appropriately  defined 
as  equivalence  classes  of  Cauchy  sequences  where  two  sequences  (xn)  and  (yn) 
are  in  the  same  equivalence  class  if  and  only  if  (xn  —  yn)  0. 
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As  with  Dedekind’s  approach,  it  can  be  momentarily  disorienting  to  sup¬ 
plant  our  relatively  simple  notion  of  a  real  number  as  a  decimal  expansion  with 
something  as  unruly  as  an  equivalence  class  of  Cauchy  sequences.  But  what 
exactly  do  we  mean  by  a  decimal  expansion?  And  how  are  we  to  understand 
the  number  1/2  as  both  .5000. . .  and  .4999. . .?  We  leave  it  as  an  exercise. 
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discontinuity 

all  types,  142 
essential,  147 
divergence 

of  a  sequence,  46, 63 
of  functional  limits,  119 
domain,  7 

double  summation,  41,79,  240 

E 

empty  set,  5 
equivalence  classes 

of  Cauchy  sequences,  302 
of  sets,  36 

equivalence  relation,  30, 36 
Euler,  Leonard,  171,270,271 
Euler’s  constant,  237 
Euler’s  sum,  264 
eventually,  48, 54,  73 
exponential  function,  271 
Extreme  Value  Theorem,  130 

F 

factorial  function,  270 
Fejer,  Lipot,  294 
Fejer  kernel,  295 
Fejer’s  Theorem,  211,294 
Fermat,  Pierre  de,  111,152 
field,  3,14,299 
fixed  point,  161 
Fourier,  Joseph,  281 
Fourier  coefficients,  285 
converge  to  zero,  289 
Fourier  series,  111,163,  212,216, 
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Cesaro  mean  convergence  of,  294 
pointwise  convergence  of, 
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fractal,  88 
frequently,  48 
function,  7 
functional  limit,  116 
Fundamental  Theorem  of  Calculus, 
156,  167,215,234,  237, 
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Gamma  function,  280 
gauge,  253 

Gauss  product  formula,  280 
Generalized  Mean  Value  Theorem, 
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generalized  Riemann  integral,  254 
Godel,  Kurt,  37 
Goldbach,  Christian,  271 
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halting  problem,  37 
Hardy,  Godfrey  Harold,  1, 166 
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alternating,  83,  237 
Heine,  Eduard,  298 
Heine-Borel  Theorem,  98 

I 

increasing 

function,  141 
sequence,  56 
infimum,  15, 18 
infinite  products,  61,78 
infinite  series,  57,  71 

associative  property,  65 
comparison  test,  72 
converges,  71 
double  summations,  79 
of  functions,  188 
partial  sum,  71 
products  of,  72,  82 
ratio  test,  78 
integer,  3 

countable,  27 
integral 

generalized  Riemann,  248,  249, 
254 

improper,  257,276 
Lebesgue,  247,  250,  290 
lower,  220 
Riemann,  220 

substitution  formula,  238,  257 
upper,  220,  234 
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Integral  Remainder  Theorem,  267 
integration-by-parts,  237,266, 
275,278 
interior,  95,261 

Interior  Extremum  Theorem,  151 
intermediate  value  property,  139, 147 
of  derivatives,  152 
Intermediate  Value  Theorem,  136 
interpolation,  206, 270 
inverse  function 
continuity,  140 
differentiability,  155 
irrational  number,  1, 4,  11 
isolated  point,  90 
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Kolmogorov,  Andrey,  291 
Kronecker,  Leopold,  3, 11 

L 

Lagrange,  Joseph  Louis,  200 
Lagrange’s  Remainder  Theorem, 
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Lebesgue  integral,  247 
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L’Hospital,  Guillaume  Francois 
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functional,  116 
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of  Riemann  sums,  217,251 
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Lipschitz  function,  135, 160 
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Mandlebrot,  Benoit,  88 
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Mean  Value  Theorem,  155, 156,255 
generalized,  158,201 
measure  zero,  240, 249,  290 
Meray,  Charles,  298 
metric,  258 

discrete,  259 

metric  space,  109, 258,  275 
complete,  259 
minimum,  16 
monotone 
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sequence,  56 

Monotone  Convergence  Theorem, 
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multiplicative  inverse,  299 

N 

natural  logarithm,  61,237,  272 
natural  number,  2 
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continuous,  113 
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P 

partial  sum,  57 
partition,  218 
(5-fine,  251 
(5(x)-fine,  253 
refinement,  218 
tagged,  223, 250 
perfect  set,  102 
pointwise  convergence,  174 
for  series,  188 
of  Fourier  series,  290,291 
polygonal  function,  207, 264 
power  series,  84,  111,  169, 171, 191, 
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differentiation  of,  194, 195 
uniform  convergence  of,  194 
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proof 
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contrapositive,  9 
of  convergence,  45 
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R 

radius  of  convergence,  192 
range,  7 
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common,  219 
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Riemann  integral,  216,220 
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properties  of,  228 
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Schroder,  Ernst,  36 
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transcendental  number,  31 
triangle  inequality,  8, 12,  51,258 
trigonometric  series,  282 

U 

uniform  convergence,  177 
and  continuity,  178, 188 
and  differentiation,  184, 186,  188 
and  integration,  231,248 
of  improper  integrals,  277 
of  power  series,  192, 194 
of  series,  188 


uniformly 

a-continuous,  241 
continuous,  132 
continuous  in  R 2,  276 
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